Kent Karlsson wrote: > Philippe Verdy wrote: > ... > > Here is what I have (this is just the part related to Hangul > > jamos in the Johab set), presented in collation order: > > # add canonical de/recomposition of "Johab" compound leading > consonnant jamos in Hangul > > # (there are 17 basic consonnants) in Hangul, IEUNG is used for > KAPYEOUN- > > #1100;HANGUL CHOSEONG KIYEOK;Lo;0;L;;;;;N;;G;;; > > 1101;HANGUL CHOSEONG SSANGKIYEOK;Lo;0;L;<johab> 1100 1100;;;;N;;GG;;; > ... > > When possible, I've preferred the "left associative" reading, just > to make it easier for the recomposition. I don't thing there is any > linguistic reason for prefering the "right associative" reading for > any of these. The current interpretation for doubled consonants is > a modern one; I think the historic reading is different (but not > quite sure exactly how).
Here also I have no good hint on which association is prefered, except the normative name. Of course this is just an intermediate decomposition, and it is expandable before actual use. (In fact there are cases where this expansion directly to three letters is already needed because there is no corresponding pair, notably if we have to map some compatibility clusters to johab clusters, and so this view is just to simplify the edition of rules.) > > There are also some direct errors in your mappings (detailed below). > > 111B;HANGUL CHOSEONG KAPYEOUNRIEUL;Lo;0;L;<johab> 1105 114C;;;;N;;RQ;;; > 111D;HANGUL CHOSEONG KAPYEOUNMIEUM;Lo;0;L;<johab> 1106 114C;;;;N;;MQ;;; > 112C;HANGUL CHOSEONG KAPYEOUNSSANGPIEUP;Lo;0;L;<johab> 1108 > 114C;;;;N;;BBQ;;; > 112B;HANGUL CHOSEONG KAPYEOUNPIEUP;Lo;0;L;<johab> 1107 114C;;;;N;;BQ;;; > -----PLAIN WRONG, yesieung used instead of ieung Thanks for pointing these 3 errors. I did not see them despite rereading the file so many times, and checking in the generated trace file which displays actual characters and not just code points. > 11F4;HANGUL JONGSEONG KAPYEOUNPHIEUPH;Lo;0;L;<johab> 11C1 > 11E6;;;;N;;pq;;; > ------PLAIN WRONG, 11E6 instead of 11BC This one is an obvious copy/paste error when creating rules. For the other two alts, I'll look to make them coherent with the left-associative rule used generally in canonical decompositions: > 1122;HANGUL CHOSEONG PIEUP-SIOS-KIYEOK;Lo;0;L;<johab> 1107 > 112D;;;;N;;BSG;;; > --- one of two alts, 1121 1100 preferable For example this rule should effectively a simple extension of the rule in the previous line related to 1121. But thanks these are not errors by themselves. I still have many tests to do with them, by comparing the results from various plain-text search operations that should find or exclude matches. Also, the file I gave you was the last I had verified, and I have another version that includes more characters (notably the <narrow> decompositions. In fact, it is your your initial comment N1051 document and that gave me the idea to reorder the rules in collation order for the Hangul script (before that it was in code point order, and it was even more difficult to edit and verify manually). I have just adapted my parser to use a sorted map (a TreeMap in Java) instead of a Vector, just to generate a sorted list on output. Thanks a lot. Philippe. (Oh! your message came to the list, despite I gave you my file in private with the authorization to copy it, so I suppose I can reply publicly here to this one, no? If this was an error, admit that it's sometimes difficult to reply to the right place when there's no instruction and the initial thread was public...) ;-) __________________________________________________________________ << ella for Spam Control >> has removed Spam messages and set aside Newsletters for me You can use it too - and it's FREE! http://www.ellaforspam.com
<<attachment: winmail.dat>>

