On Wed, 3 Dec 2003, Philippe Verdy wrote: > I just have another question for Korean: many jamos are in fact composed > from other jamos: this is clearly visible both in their name and in their > composed glyph. What would be the linguistic impact of decomposing them (not > canonically!)? Do Korean really learn these jamos without breaking them into > their components? I think here about SSANG (double) consonnants, or the
The Korean alphabet invented in 1443 and announced in 1446 included 17 consonants and 11 vowels. Modern Korean uses 14 consoants and 10 vowels (3 consonants and 1 vowel have become obsolete. Korean 'ABC-song' enumerates them only (i.e. it doesn't include cluster/complex letters.) The vowel 'U+119E ARAE A á' were used until the early 20th century when it was 'officially' made out of use in the draft standard of Korean orthography by the Korean Linguistic Society in 1933 [1], which became the basis of both South and North Korean orthographic standards after the division of the country. See p. 6(of the PDF file, or p. 2 in the actual document) of the scanned copy of the draft standard for the list of Korean letters along with names(The upper left part of p.6 in PDF when rotated counterclockwise by 90 degrees.) All others are composed out of them. A few additional consonants were used briefly to transcribe Chinese phonems in phonetic textbooks in the 15th century, but have not been used otherwise. I and Kent, on several occasions, wrote that complex Korean letters (Korean letter clusters) should have been made __canonically_ equivalent to basic Korean letter sequences. They were compatibly equivalent to each other in Unicode 2.0, but even that compatible equivalence was removed instead of being upgraded to the canonical equivalence. That's another mistake in Korean encoding in Unicode. In the first place, complex Korean letters should not have been encoded just like precomposed syllables should not have been. With the NFC/NFD frozen forever, it is now impossible to rectifiy this. > initial Y or final E of some vowels... > Of couse I won't be able to use such decomposition in Unicode, but would it > be possible to use it in some private encoding created with a m:n charset > mapping from/to Unicode? That kind of composition/decomposition is necessary for linguistic analysis of Korean. Search engines (e.g. google), rendering engines and incremental searches also need that. See http://i18nl10n.com/korean/jamo.html (you need Unbatang font - GPL'd opentype font for Korean- available at http://i18nl10n.com/fonts/UnBatang.ttf and mozilla either on Linux/Unix or on Windows. Uniscribe on XP can take advantage of Korean opentype fonts, but only to a limited extent. In particular, it doesn't support the kind of equivalence I'm talking about here so that for Mozilla even on Windows 2k/XP, I had to build a custom composition routine) http://i18nl10n.com/korean/jamocomp.html http://bugzilla.mozilla.org/show_bug.cgi?id=176315 http://bugzilla.mozilla.org/show_bug.cgi?id=177877 http://bugzilla.mozilla.org/show_bug.cgi?id=176290 Jungshik [1] http://i18nl10n.com/korean/orth1933.pdf

