Kent Karlsson wrote: > Philippe Verdy wrote: > > If we count also the encoded modern LV and LVT johab syllables: > > > > ( ((Ls|Lm)+ (Vs|Vm)+) | > > ((Ls|Lm)* (LsVs|LsVm|LmVs|LmVm) (Vs|Vm)*) | > > ((Ls|Lm)* (LsVsTs|LsVmTs|LmVsTs|LmVmTs| > > LsVsTm|LsVmTm|LmVsTm|LmVmTm) ) ) (Ts|Tm)* > > I'm not even going to try to parse that...
What is complicate to read here ? I used blanks to indent terms that can match at the same level. If it is not clear enough to you, the rule expands as one of the three cases below: - Hangul syllables coded only with jamos: (Ls|Lm)+ (Vs|Vm)+) (Ts|Tm)* - Hangul syllables containing 1 "LV" precomposed johab: (Ls|Lm)* (LsVs|LsVm|LmVs|LmVm) (Vs|Vm)* (Ts|Tm)* - Hangul syllables containing 1 "LVT" precomposed johab: (Ls|Lm)* (LsVsTs|LsVmTs|LmVsTs|LmVmTs|LsVsTm|LsVmTm|LmVsTm|LmVmTm) (Ts|Tm)* In Hangul, all text coded in one of the two last sets of syllables are canonically equivalent to texts in the first set. The problem is that the first set also contains text that should be considered as canonically equivalent but are not (and will never be) according to the stability policy of normalized decompositions: there's no way to associate a "Lm" jamo with its "Ls" components so that they compare as canonically equivalent (except of course in UCA where they may compare equally, provided that UCA is updated to give them equal collation weights). __________________________________________________________________ << ella for Spam Control >> has removed Spam messages and set aside Newsletters for me You can use it too - and it's FREE! http://www.ellaforspam.com
<<attachment: winmail.dat>>

