Jungshik Shin wrote: > Sorting Hangul letters (Jamos) according to the current version > of allkeys.txt is rather like sorting Latin letters according to > the Unicode 4.0 code points. Because this is well known, UTS #10 > goes to a length to explain how to properly Hangul letters(Jamos). > However, as it stands, there are a few issues to be clarified. > > In mid May this year after a proposed update of UTS #10 had > been posted, > there was a thread of discussion about treatment of Hangul > letters (Jamos) > in UCA. In the thread, I raised the following issue > (interleaving issue > and different treatment of cluster jamos depending on whether they're > given separate code points of their own in U+1100 block or > they have to > be represented as sequences of Jamos encoded).
You may wish to look at http://std.dkuug.dk/JTC1/SC22/WG20/docs/n1051-hangulsort.pdf which contains a much updated version of my paper on the subject. The table entries are also found in plain text form at http://std.dkuug.dk/JTC1/SC22/WG20/docs/n1051t-table-hangulctt6.txt (the "28" at the end is spurious...) > After a thread of emails exchanged, Mark Davis and I found > that both of us > are more or less in the same page as to how Hangul letters be > collated. > In summary, > > 1. Weights for T, V, and L should be assigned in such a way that > T < V < L for all T, V, and L's That would be L < T < V; but that is complicated by the actual need for (the superficially contradictory) V < L < T < V, with the latter T and V after all scripts. The Vs at two radically different positions in the table is for different positions of the V in a syllable; V < L is for first V in a syllable, T < V is for non-first Vs in a syllable. > 2. Expand precomposed (cluster) Jamos into sequences of component > basic Jamos Needed for covering all combinations of Jamos. If limited to (a superset) of modern Jamo, this expansion can be avoided. For details, see my paper referenced above, which lists the weightings and contractions needed for avoiding this expansion in many (but not all) cases. > 3. Terminate every syllable with 'TERM' that has a lower weight than > all T's (there's an alternative to this, but both favors this > more than the alternative) This can be avoided if the weighting is done in a particular way. See my paper for details. /kent k

