Hello,
I've always wondered why Hangul Conjoing medial vowels (U+1160-U+11A2) and trailing conosnants(U+11A8-U+11F9) are classified as Lo (letter other) instead of Mn(combining, non-spacing). A couple of days ago, a string of events happened and I finally decided to raise this issue in this forum. ( For some background information, see <http://jshin.net/i18n/korean/hunmin.html>). Here I'm trying to present my case for why that change is necessary. Hopefully, I'll make it convincing enough for the UTC to make necessary changes. TUS 3.0(p.53. it doesn't use the regular expression) defines a Hangul syllable as S := L+V+T* where L,V, and T denote Hangul leading consonants, Hangul medial vowels and Hangul trailing consonants, respectively and '+' and '*' have their usual RE meanings. An optional Hangul tone mark M (U+302E and U+302F) may be added and we have S := L+V+T*M? U+302E and U+302F are classified as Mn. I find it hard to understand why V and T are put into Lo category instead of Mn while vowels/vowel marks and 'subjoined' consonants in South and South East Asian scripts are put into Mn (or Mc in some cases). It seems to me that the 'rendering behavior' of V and T with L('s) acting as a base character is similar to that of vowels/vowel marks and 'subjoined' consonants with 'head' consonant(s) acting as a base character in South and Southeast Asian scripts. Just as vowels/vowel marks and 'subjoined' consonants should be kept together with what they follow ( head consonants) (e.g. they should not be broken across two lines), V and T (and M) have to be kept together with L. Moreover, applications like terminal emulators should treat V and T as taking zero screen-width and allotcate a sequence of L,V,T the same screen width as L(for Hangul Jamos, it's 'double screen width'). That requirement is very similar to what's required of a sequence of a head consonant, (a) subjoined consonant(s)(as found in Tibetan), and (a) vowel/vowel mark(s) in South and Southeast Asian scripts. An implementation like Markus Kuhn's wcwidth.c automatically generates the table out of UnicodeData.txt, but it has to make an exception about Hangul medial vowels and trailing consonants. (see http://www.cl.cam.ac.uk/~mgk25/ucs/wcwidth.c). ICU function u_charCellWidth() also returns U_ZERO_WIDTH for Hangul vowels and trailing consonants whereas it returns U_FULL_WIDTH for Hangul leading consonants. (http://oss.software.ibm.com/icu/apiref/uchar_8h.html#a360). It's to their credits that they noticed the need to make an 'exception' for Hangul Jamos, but I'm afraid some implementations may blindly rely on UnicodeData.txt. Some developers may feel uncomfortable deviating from (what they perceive as) the Unicode standard even when contacted (by me or others) for doing a special treatment of Hangul Conjoining Jamos. To avoid a potential problem arising from this possibility, in my opinion it's necessary to make changes I'm suggesting. Although assigning Mn to Hangul vowels and trailing consonants appears to have little problem, Hangul leading consonants don't seem to fit the definition of any exisitng category. When used at the beginning of the sequence for a syllable (represented with 'LVT?'), it can be 'Lo'. However, if multiple L's are used in the Jamo sequence for a given syllable(that is, 'L{2,}V+T*'), all but the first one are combining/non-spacing. I think the same problem exists for consonants in some South and Southeastern scripts for which consonants are only encoded once(i.e. subjoined consonants are not encoded separately as is the case of Tibetan). For instance, Devanagari consonants (U+0915 - U+0939) are Lo although they can be combining when they're not the first consonant in a syllable. Given this, I believe assigning Lo to Hangul Conjoining leading consonants can be justified unless UTC decides to adopt a more fine-grained category scheme than the current one. In summary, I proposed that the general category of Hangul Conjoining medial vowels and trailing consonants (U+1160 - U+11FF) be changed from Lo(letter others) to Mn(non-spacing, combining) to be in line with and meet the rendering and other requirements of Hangul Conjoining Jamos. Thank you in advance for considering my suggestion, Jungshik Shin P.S. The following image may be quite suggestive of what I wrote above. http://chem.skku.ac.kr/~wkpark/trash/xuhpulm.png P.S.2: The fact that exactly the same technique as described in the section 'Thai rendering behavior' in the following summary of Thai by GNU/Linux/X has been used for Hangul rendering is another indication that Hangul Jamos have to be treated similarly to the way South and Southeast Asian scripts are treated. ftp://ftp.nectec.or.th/pub/thailinux/cvs/docs/thaisupp/thaisupp.html

