RE: Compression through normalization

Philippe Verdy Thu, 04 Dec 2003 09:35:59 -0800

Kent Karlsson wrote:
> Philippe Verdy wrote:
> >     If we count also the encoded modern LV and LVT johab syllables:
> >     
> >     ( ((Ls|Lm)+                       (Vs|Vm)+) |
> >       ((Ls|Lm)* (LsVs|LsVm|LmVs|LmVm) (Vs|Vm)*) |
> >       ((Ls|Lm)* (LsVsTs|LsVmTs|LmVsTs|LmVmTs|
> >                  LsVsTm|LsVmTm|LmVsTm|LmVmTm) ) ) (Ts|Tm)*
> 
> I'm not even going to try to parse that...


What is complicate to read here ? I used blanks to indent
terms that can match at the same level.

If it is not clear enough to you, the rule expands as
one of the three cases below:

- Hangul syllables coded only with jamos:
(Ls|Lm)+ (Vs|Vm)+) (Ts|Tm)*

- Hangul syllables containing 1 "LV" precomposed johab:
(Ls|Lm)* (LsVs|LsVm|LmVs|LmVm) (Vs|Vm)* (Ts|Tm)*

- Hangul syllables containing 1 "LVT" precomposed johab:
(Ls|Lm)* (LsVsTs|LsVmTs|LmVsTs|LmVmTs|LsVsTm|LsVmTm|LmVsTm|LmVmTm) (Ts|Tm)*

In Hangul, all text coded in one of the two last sets of
syllables are canonically equivalent to texts in the first set.

The problem is that the first set also contains text that should be
considered as canonically equivalent but are not (and will never be)
according to the stability policy of normalized decompositions:
there's no way to associate a "Lm" jamo with its "Ls" components
so that they compare as canonically equivalent (except of course
in UCA where they may compare equally, provided that UCA is updated
to give them equal collation weights).


__________________________________________________________________
<< ella for Spam Control >> has removed Spam messages and set aside
Newsletters for me
You can use it too - and it's FREE!  http://www.ellaforspam.com

<<attachment: winmail.dat>>

RE: Compression through normalization

Reply via email to