André Schappo asked:

> Been looking at http://www.unicode.org/Public/UNIDATA/Jamo.txt
> 
> There appears to be 2 different romanizations at play in the file? One for the
> short name and another for the full name
> eg 1100; G   # HANGUL CHOSEONG KIYEOK
> 
> I have searched unicode.org but cannot find appropriate documentation. Can
> anyone point me to definitive documentation?

Yes, but you're not gonna like the answer. ;-)

This situation for Hangul romanizations derives from the extensive haggling 
over Hangul syllable encoding which extended through 1992 (the crucial time for 
the Unicode/10626 "merger" process), and then continued in 1995, when Amd 5 to 
10646 resulted in the re-encoding of the Hangul syllables into the U+AC00 block 
11,172 syllable we currently have in the standard.

Unicode 1.0 had no conjoining jamo, but essentially just encoded the entirety 
of what was then known as KS C 5601 (now KS X 1001-1992) as compatibility 
characters. The names all used the "G" Romanization conventions. So U+3131 was 
HANGUL LETTER GIYEOG. The KS C 5601 compatibility Hangul syllable blocks (2350 
of them) were encoded in the range U+3400..3D2D. The standard was silent about 
the naming conventions for those syllables, because it neither listed them 
explicitly, nor gave a rule for their names. However, from the naming 
conventions for the *circled* Hangul syllable characters, it is clear what the 
intent was. The syllables, if they had been spelled out, would all have used 
the "G" Romanization conventions. So U+3400, in principle, was "HANGUL GA", 
U+3401 "HANGUL GAG", etc., through the set. So, to summarize, for Unicode 1.0, 
the situation was:

U+3131 HANGUL LETTER GIYEOG
U+3132 HANGUL LETTER SSANG GIYEOG
...
U+3400 HANGUL GA
U+3401 HANGUL GAG
...
U+3D2D HANGUL HING

All that was completely revised during what turned into seat-of-the-pants, 
late-night, up-against-the-deadline negotiations (where have we encountered 
that recently?) during the July, 1992 WG2 meeting in Seoul. The "appropriate 
documentation" is all contained in the WG2 document register, but it is hard to 
find nowadays, because 1992 was long before WG2 maintained its document 
register online. For the record, here are the relevant documents:

N764 Minutes of Unicode Korean Subcommittee meeting, 01-Oct, 1991
N828 Comments of Republic of Korea on ISO/DIS 10646-1.2(1992) [1992.06.28]
N840 Proposal for disposition of Korean Comments to DIS 1.2 [1992.06.27]
N848 Modified Korean Position [1992.07.02]
N852 2nd Proposal for disposition of Korean Comments to DIS 1.2 [1992.07.02]
N860 Details of Korean Jamo Combining Rules [1992.07.02]
N861 China's Position on Hangul in UCS [1992.07.02]
N864 Modification of Korean Position (2) [1992.07.03]
N868 Revisions to Korean [1992.07.03]

The net outcome of this was to introduce the U+1100 block of conjoining jamo 
letters, extend the block of Hangul syllables (U+3D2E..U+4DFF), and change the 
names of everything. All names were changed to use  the "K" Romanization 
conventions. The detailed results can be found in ISO/IEC 10646-1:1993 or in 
Unicode 1.1, but as both of those are *also* difficult to obtain these days, 
here is a summary of the outcome:

U+1100 HANGUL CHOSEONG KIYEOK
U+1101 HANGUL CHOSEONG SSANGKIYEOK
...
U+3131 HANGUL LETTER KIYEOK
U+3132 HANGUL LETTER SSANGKIYEOK
...
U+3400 HANGUL SYLLABLE KIYEOK A
U+3401 HANGUL SYLLABLE KIYEOK A KIYEOK
...
U+3D2D HANGUL SYLLABLE HIEUH I IEUNG
U+3D2E HANGUL SYLLABLE KIYEOK A SSANGKIYEOK
...
U+4DFF HANGUL SYLLABLE MIEUM WEO RIEUL-THIEUTH

In 1995, there was another complete revolution in the Hangul encoding, as the 
pressure was on to include the *entire* set of 11,172 syllables, in the same 
predictable order we now see in the standard, rather than as a compatibility 
block from KS C 5601, filled out by extensions. The relevant documents in WG2 
were:

N1158 Korean National Position for adding Hangul characters [1995.02.27]
N1170 Canadian Position on Korean Proposal in N 1158 for adding Hangul 
characters [1995.03.10]
N1198 Working Draft for a proposed draft amendment to ISO/IEC 10646-1:1993 
[1995.04.05]
N1199 Background on Korean Coding [1995.04.05]
N1209 Proposed text of pDAM 5 to 10646, Hangul Character Collection
N1265 Report on Letter Ballot of PDAM5 to ISO/IEC 10646-1 (Hangul): Proposed 
Disposition of Comments from National Bodies [1995.09.26]
N1285 Hangul Syllable Character Name Generation Algorithm

This amendment resulted in removal of all the Hangul syllables in the range 
U+3400..U+4DFF, and replacement by the current block of Hangul syllables at 
U+AC00..U+D7AF. A key aspect of this change was that the set of 11,172 was an 
algorithmic ordering of the syllables. There was an argument at the time, but 
in the end, it was decided that the *names* of the characters shouldn't be 
maintained as a hand-edited list of 11,172 names, but could be defined 
algorithmically. N1285 defined that algorithm. It also went back to the 
principle that the names for syllables (as in other cases in the standard) 
would better be handled as romanizations of the pronunciation of the syllables, 
rather than by spelling out sequences of letters. So we ended up with the jamo 
short names (printed in the U+1100 block in the case of 10646 back then), and 
names for syllables very reminiscent of Unicode 1.0 names. (Note this did *not* 
change anything about the then existing, standard names for the U+1100 !
 block of conjoining jamos or the U+3130 block of compatibility jamos.) In 
summary:

U+AC00 HANGUL SYLLABLE GA
U+AC01 HANGUL SYLLABLE GAG
...
U+D7A3 HANGUL SYLLABLE HIH

So that is how we ended up with one set of romanizations for the jamo 
characters and another for the Hangul syllables. As for many Unicode "just so" 
stories, there isn't a convenient documentation of this in the standard itself, 
but if you want, you can bookmark this summary once it appears in the Unicode 
mail archives. ;-)

--Ken



Reply via email to