For when particular characters were added to Unicode, you can also consult the new DerivedAge.txt, currently in the BETA at:
http://www.unicode.org/Public/BETA/Unicode3.2/DerivedAge-3.2.0d2.txt Mark ————— Πόλλ’ ἠπίστατο ἔργα, κακῶς δ’ ἠπίστατο πάντα — Ὁμήρου Μαργίτῃ [For transliteration, see http://oss.software.ibm.com/cgi-bin/icu/tr] http://www.macchiato.com ----- Original Message ----- From: "Kenneth Whistler" <[EMAIL PROTECTED]> To: <[EMAIL PROTECTED]> Cc: <[EMAIL PROTECTED]>; <[EMAIL PROTECTED]> Sent: Wednesday, January 30, 2002 12:18 Subject: Re: Questions about Unicode history > Marco, > > I'll answer as many of your questions as I can, and will > cc this to the unicode list (in part to forestall a gazillion > "Well, I think maybe X" responses). > > --Ken > > > - When did the Unicode project start, and who started it? > > The detailed history for this will soon be available on the > Unicode website. The short answer is that Joe Becker (Xerox) and > Lee Collins (Apple) were highly instrumental in getting the > ball rolling on this, and the preliminary work they did, > primarily on Han unification, dated from 1987. > > However, "the Unicode project" had many beginnings -- many points > where you could mark a milestone in its early development. And > the Unicode Consortium celebrated a number of 10-year > anniversaries, starting from 1998 and continuing through last year. > > > > > - Is it true Han Unification was the core of Unicode, and the idea of an > > universal encoding come afterwards? > > The effort by Xerox and Apple to do a Han unification was key to > the motivation that eventually led to a serious effort to actually > *do* Unicode and then to establish the Unicode Consortium to > standardize and promote it. However, the idea of a universal encoding > predated that considerably. In some respects the Xerox Character Code > Standard (XCCS) was a serious attempt at providing a universal > character encoding (although it did not include a unified Han > encoding, but only Japanese kanji). XCCS 2.0 (1980) contained, in > addition to Japanese kanji: Latin (with IPA), Hiragana, Bopomofo, Katakana, > Greek, Cyrillic, Runic, Gothic, Arabic, Hebrew, Georgian, Armenian, > Devanagari, Hangul jamo, and a wide variety of symbols. The early > Unicoders mined XCCS 2.0 heavily for the early drafts of Unicode 1.0, > and always regarded it as the prototype for a universal encoding. > > Additionally, you have to consider that the beginning of the ISO project > for a Multi-octet Universal Character Set (10646) predated the > formal establishment of Unicode. Part of the impetus for the serious > work to standardize Unicode was, of course, discontent with the > then architecture of the early drafts of 10646. > > > > > - Who and when invented the name "Unicode"? > > This one has a definitive answer: Joe Becker coined the term, > for "unique, universal, and uniform character encoding", in 1987. > First documented use is in December, 1987. > > > > > - When did the ISO 10646 project start? > > Unfortunately, the document register for early WG2 documents doesn't > have dates for all the early documents, and I don't have all the > early documents to check. But... > > The 4th meeting of WG2 was held in London in February, 1986. The > first three meetings were in Geneva, Turin, and London, respectively. > That puts the likely timeframe for the Geneva meeting, and the > establishment of WG2 by SC2 at about 1984. The *only* project for WG2 > was 10646. > > Some of the older oldtimers on the list may have more exact information > about the early WG2 work. > > > > > - When did Unicode and ISO 10646 merge? > > It wasn't a single date that can be pointed to, like the signing > of an armistice. In some respects, Unicode and ISO 10646 are *still* > merging, as modifications and amendments to deal with niggling little > architectural edge cases are worked out. > > However the key dates were: > > January 3, 1991. Incorporation of the Unicode Consortium, which > signalled to SC2 that the Unicoders were serious in their > intentions. > > May, 1991. Meeting #19 of WG2 in San Francisco. An ad hoc meeting > took place between WG2 members and some Unicoders, which paved > the way for the later "merger" of the standards. > > June, 1991. The 10646 DIS 1 was defeated in its ballotting. This left > the only reasonable way forward an architectural compromise with > the Unicode Standard, which at that point was in copy edit and > about to go to press. > > June 3, 1991. The date of "10646M proposal draft to merge Unicode and > 10646", by Ed Hart. This was a key document in the resulting > merger of features. > > August, 1991. The Geneva WG2 meeting accepted Han unification, combining > marks, dropped byte-by-byte restrictions on code values for UCS-2, > and accepted Unicode repertoire additions. From that point forward, > the overall aspect of what became ISO/IEC 10646-1:1993 was clear. > > > > > - What is the name of the GB and JIS standards that have the same repertoire > > as Unicode? > > GB 13000 has the same repertoire as ISO/IEC 10646-1:1993. > JIS X 0221 has the same repertoire as ISO/IEC 10646-1:1993. > > Those two were effectively national publications of 10646. You can > work out the correlations with Unicode from that. > > GB 18030:2000 in principle has the same repertoire (but different > encoding) as ISO/IEC 10646-1:2000, i.e. the same as Unicode 3.0. > (But there were small problems in it.) However, the 4-byte form > of GB 18030 maps all Unicode code points, assigned or not, so > it will (in theory, at least) always have the same repertoire > as Unicode. > > > > > - When did Unicode stop to be "16 bits"? (I.e., when were surrogates added?) > > In terms of publication, with Unicode 2.0 in 1996. However, the decision > was taken by the UTC considerably before publication. > > Amendment 1 to 10646-1 (UTF-16) was proposed to WG2 in WG2 N970, dated > 7 February 1994. Mark Davis was the project editor for that amendment. > > > > > - I can't remember the version when some scripts were added: Syriac, Thaana, > > Sinhala, Tibetan, Myanmar, Ethiopic, Cherokee, Canadian Syllabics, Ogham, > > Runes, Khmer, Mongolian, Yi, Etruscan, Gothic, Deseret, CJK ext. A, CJK ext. > > B. > > See pp. 968-969 of TUS 3.0. > > Tibetan was in Unicode 1.0, then was removed. It was readded, in a > new encoding, in Unicode 2.0. > > Syriac, Thaana, Sinhala, Myanmar, Ethiopic, Cherokee, Canadian Syllabics, > Ogham, Runic, Khmer, Mongolian, Yi, CJK Extension A were added in > Unicode 3.0. > > Old Italic (including Etruscan), Gothic, Deseret, and CJK Extension B > were added in Unicode 3.1. > > > - Roughly, how many ideographs are in modern use in extensions A and B? > > Not many. I'll refer to the IRG experts to make a guess there. > > > > > - Roughly, when will version 3.2 become official? > > March, 2002. > > > > > - Roughly, when will the version 4 book be published? > > Currently still scheduled for March, 2003, but schedule slip is > always a possibility on a major publication project like this. > > > I also have a few non-Unicode questions: > > > > > > - When was ASCII first published and by whom? > > 1967. By ANSI X3.4. > > Actually, that was preceded by ASCII per se, the earliest form of > which was published as a standard in 1963 by ASA (American Standards > Association -- the predecessor to ANSI). But the 1963 version of ASCII > had some differences from what we now know as ASCII. > > > > > - What standard was current before ASCII? (BAUDOT, is it?) How many bits did > > it use? > > I'll let the ancient computer and terminal mavens have at that > one. There is lots of early character encoding history available > on the web -- it's not too hard to find information about it, > actually. > > > > > - Did the ASCII standard expire, and when? > > No, it is still a standard. > > > > > - When was ISO 646 published? > > 1972. > > > > > - I think that ISO 646 expired. When? > > No, it is still a standard. The current version is the ISO-646-IRV, > revised in 1991. > > > > > - When was ISO 8859 published? > > It comes in many parts, each of which has a separate publication date. > > > > > - When did the first double-byte encoding appear? > > Dunno. Maybe one of the IBMers will know when IBM first started > implementing double-byte Asian character sets. > > --Ken > >

