Here is a summary of all the answers I received to my "historical" questions.
Sorry for the length of this post, but I think that many people will find this worth reading. Thanks again to all the people who took the time to reply. _ Marco --- --- --- --- Q: When did the Unicode project start, and who started it? A: [Magda Danish] I am currently working on a few web pages that talk about the Unicode history. A: [Mark Davis] While we will continue to flesh out and improve these pages, the initial versions are publicly available, under "Historical Data" on: <http://www.unicode.org/unicode/consortium/consort.html> A: [Kenneth Whistler] The short answer is that Joe Becker (Xerox) and Lee Collins (Apple) were highly instrumental in getting the ball rolling on this, and the preliminary work they did, primarily on Han unification, dated from 1987. However, "the Unicode project" had many beginnings -- many points where you could mark a milestone in its early development. And the Unicode Consortium celebrated a number of 10-year anniversaries, starting from 1998 and continuing through last year. A: [Joseph Becker] Don't forget Mark Davis (then of Apple), who was more than highly instrumental in getting the ball rolling! And, don't forget my "Unicode '88" manifesto, which was the clear intentional inception of Unicode as a specific initiative. I drafted it in February 1988 after the enthusiastic reception of my Unicode proposal at Uniforum, its final draft being August 1988. Since the Consortium has in fact handed it out as marking the start of Unicode, I think its mention might be clarified in our official history, which currently says: "September 1988 ... Becker later presents paper on Unicode to ISO WG2." A: [Nelson H.F. Beebe] I remember reading this article more than 15 years ago, and being impressed by the possibilities that it represented: @String{j-SCI-AMER = "Scientific American"} @Article{Becker:1984:MWP, author = "Joseph D. Becker", title = "Multilingual Word Processing", journal = j-SCI-AMER, volume = "251", number = "1", pages = "96--107", month = jul, year = "1984", CODEN = "SCAMAC", ISSN = "0036-8733", bibdate = "Tue Feb 18 10:44:43 MST 1997", bibsource = "Compendex database", abstract = "The advantages of computerized typing and editing are now being extended to all the living languages of the world. Even a complex script such as Japanese or Arabic be processed.", acknowledgement = ack-nhfb # " and " # ack-rc, affiliationaddress = "Xerox Office Systems Div, Palo Alto, CA, USA", classification = "723", journalabr = "Sci Am", keywords = "Character Sets; data processing; word processing",} It was followed up by this more formal one: @String{j-CACM = "Communications of the ACM"} @Article{Becker:1987:AWP, author = "Joseph D. Becker", title = "{Arabic} word processing", journal = j-CACM, volume = "30", number = "7", pages = "600--610", month = jul, year = "1987", CODEN = "CACMA2", ISSN = "0001-0782", bibdate = "Thu May 30 09:41:10 MDT 1996", bibsource = "http://www.acm.org/pubs/toc/", URL = "http://www.acm.org/pubs/toc/Abstracts/0001-0782/28570.html", acknowledgement = ack-nhfb, keywords = "algorithms; design; documentation; human factors; measurement", review = "ACM CR 8902-0084", subject = "{\bf H.4.1}: Information Systems, INFORMATION SYSTEMS APPLICATIONS, Office Automation, Word processing. {\bf J.5}: Computer Applications, ARTS AND HUMANITIES, Linguistics. {\bf I.7.1}: Computing Methodologies, TEXT PROCESSING, Text Editing, Languages.",} The latter is not in unicode.bib, but will soon be. --- --- --- --- Q: Is it true Han Unification was the core of Unicode, and the idea of an universal encoding come afterwards? A: [Kenneth Whistler] The effort by Xerox and Apple to do a Han unification was key to the motivation that eventually led to a serious effort to actually *do* Unicode and then to establish the Unicode Consortium to standardize and promote it. However, the idea of a universal encoding predated that considerably. In some respects the Xerox Character Code Standard (XCCS) was a serious attempt at providing a universal character encoding (although it did not include a unified Han encoding, but only Japanese kanji). XCCS 2.0 (1980) contained, in addition to Japanese kanji: Latin (with IPA), Hiragana, Bopomofo, Katakana, Greek, Cyrillic, Runic, Gothic, Arabic, Hebrew, Georgian, Armenian, Devanagari, Hangul jamo, and a wide variety of symbols. The early Unicoders mined XCCS 2.0 heavily for the early drafts of Unicode 1.0, and always regarded it as the prototype for a universal encoding. Additionally, you have to consider that the beginning of the ISO project for a Multi-octet Universal Character Set (10646) predated the formal establishment of Unicode. Part of the impetus for the serious work to standardize Unicode was, of course, discontent with the then architecture of the early drafts of 10646. --- --- --- --- Q: Who and when invented the name "Unicode"? A: [Kenneth Whistler] This one has a definitive answer: Joe Becker coined the term, for "unique, universal, and uniform character encoding", in 1987. First documented use is in December, 1987. A: [Nelson H. F. Beebe] On the origin of the name Unicode, my bibliography at <http://www.math.utah.edu/pub/tex/bib/index-table-u.html#unicode>, <ftp://ftp.math.utah.edu/pub/tex/bib/unicode.*> has this to say: Historical note: a library search on the name ``Unicode'' turns up several entries that predate its use for an international computer character set standard. These include: - ``Unicode'': the universal telegraphic phrase-book, London (1889, 1896, 1901, 1910). - Unicode: three-letter difference telegraphic code, Prague, Czechoslovakia (1956, 1967). - UNICODE automatic coding for UNIVAC scientific data automation system 1103A or 1105, Sperry Rand Corporation. Univac Division (1959). - Atle Grahl-Madsen, UNICODE 72: - Two-letter, three-letter and numerical - country codes, Bergen, Norway (1971, 1972). - David L. Szekely, Unicode: ein Verfahren zur Optimierung der begrifflichen Denkleistung: eine Einfuhrung in die ``vereinheitlichte Wissenschaft'', Basel (1979). - U. S. Dept. of Health and Human Services, ``Food protection unicode'' (1988). - Unicode single transition time coding (1976). - Unicode state assignment techniques (1996). My suspicion is that someone remembered the UNIVAX UNICODE system when the name was being selected for the ISO 10646 companion project, but let's let the people involved respond. I'm only guessing. --- --- --- --- Q: When did the ISO 10646 project start? A: [Kenneth Whistler] Unfortunately, the document register for early WG2 documents doesn't have dates for all the early documents, and I don't have all the early documents to check. But... The 4th meeting of WG2 was held in London in February, 1986. The first three meetings were in Geneva, Turin, and London, respectively. That puts the likely timeframe for the Geneva meeting, and the establishment of WG2 by SC2 at about 1984. The *only* project for WG2 was 10646. Some of the older oldtimers on the list may have more exact information about the early WG2 work. A: [Tim Greenwood] A paper that I wrote ("International Character Sets - the 7/8 bit story") for an April 1985 conference at Digital references a note from Masami Hasegawa, the original editor of 10646. This note was dated 17 October 1984. Masami's paper "Towards Multi-Lingual Data Processing" for the same conference has the paragraph 'In the plenary meeting of TC97/SC2 of ISO, which is a sub-committee for information coding, it was decided that an International Standard is needed for a two byte graphic character set. Thus a working group WG2, two-octet graphic, was formed to write a draft proposal.' --- --- --- --- Q: When did Unicode and ISO 10646 merge? A: [Kenneth Whistler] It wasn't a single date that can be pointed to, like the signing of an armistice. In some respects, Unicode and ISO 10646 are *still* merging, as modifications and amendments to deal with niggling little architectural edge cases are worked out. However the key dates were: January 3, 1991. Incorporation of the Unicode Consortium, which signalled to SC2 that the Unicoders were serious in their intentions. May, 1991. Meeting #19 of WG2 in San Francisco. An ad hoc meeting took place between WG2 members and some Unicoders, which paved the way for the later "merger" of the standards. June, 1991. The 10646 DIS 1 was defeated in its ballotting. This left the only reasonable way forward an architectural compromise with the Unicode Standard, which at that point was in copy edit and about to go to press. June 3, 1991. The date of "10646M proposal draft to merge Unicode and 10646", by Ed Hart. This was a key document in the resulting merger of features. August, 1991. The Geneva WG2 meeting accepted Han unification, combining marks, dropped byte-by-byte restrictions on code values for UCS-2, and accepted Unicode repertoire additions. From that point forward, the overall aspect of what became ISO/IEC 10646-1:1993 was clear. A: [Otto Stolz] The merger was initiated by an informal meeting of Unicode, and WG2 members, during the JTC1/SC2/WG2 meeting in San Francisco, California, USA, in May 1991. At that time, ISO DIS 10646 (the 1st one) was still in ballot, so no formal discussion, let alone an agreement, was allowed by JTC1's rules. By mid-July, DIS 10646 was formally voted down (P-members: 8 YES, 11 NO, 2 abstained; O-members: 1 YES, 3 NO, 0 abstained). 9 out of 14 NO votes mentioned the merger ("only one universal code"), in their national comments. The merger, and the basic architecture, were agreed on, at the ISO-IEC JTC1/Sc2/WG2 meeting in Geneva, Switzerland, August 19th through 23rd, 1991 In Octobre 1991, ISO SC2 plenary (in Rennes, France) unanimously authorized WG2 to issue a new DIS 10646 in January 1992 for a 4-month (i. e. shortened) vote. A: [Tim Greenwood] See <http://groups.google.com/groups?q=hasegawa+ISO+10646&hl=en&selm=10635%40sun 103.crosfield.co.uk&rnu =2> for a report on the first (or one of the first) merger meetings. --- --- --- --- Q: What is the name of the GB and JIS standards that have the same repertoire as Unicode? A: [Kenneth Whistler] GB 13000 has the same repertoire as ISO/IEC 10646-1:1993. JIS X 0221 has the same repertoire as ISO/IEC 10646-1:1993. Those two were effectively national publications of 10646. You can work out the correlations with Unicode from that. GB 18030:2000 in principle has the same repertoire (but different encoding) as ISO/IEC 10646-1:2000, i.e. the same as Unicode 3.0. (But there were small problems in it.) However, the 4-byte form of GB 18030 maps all Unicode code points, assigned or not, so it will (in theory, at least) always have the same repertoire as Unicode. --- --- --- --- Q: When did Unicode stop to be "16 bits"? (I.e., when were surrogates added?) A: [Kenneth Whistler] In terms of publication, with Unicode 2.0 in 1996. However, the decision was taken by the UTC considerably before publication. Amendment 1 to 10646-1 (UTF-16) was proposed to WG2 in WG2 N970, dated 7 February 1994. Mark Davis was the project editor for that amendment. --- --- --- --- Q: I can't remember the version when some scripts were added: Syriac, Thaana, Sinhala, Tibetan, Myanmar, Ethiopic, Cherokee, Canadian Syllabics, Ogham, Runes, Khmer, Mongolian, Yi, Etruscan, Gothic, Deseret, CJK ext. A, CJK ext. B. A: [Rick McGowan] Tibetan was in 1.0, but was REMOVED in the ISO merger of Unicode 1.1, and came back in a different form in Unicode 2.0. For the rest, you should go to the Enumerated Versions page of the web site!. A: [Kenneth Whistler] See pp. 968-969 of TUS 3.0. Tibetan was in Unicode 1.0, then was removed. It was readded, in a new encoding, in Unicode 2.0. Syriac, Thaana, Sinhala, Myanmar, Ethiopic, Cherokee, Canadian Syllabics, Ogham, Runic, Khmer, Mongolian, Yi, CJK Extension A were added in Unicode 3.0. Old Italic (including Etruscan), Gothic, Deseret, and CJK Extension B were added in Unicode 3.1. A: [Mark Davis] For when particular characters were added to Unicode, you can also consult the new DerivedAge.txt, currently in the BETA at: <http://www.unicode.org/Public/BETA/Unicode3.2/DerivedAge-3.2.0d2.txt>. --- --- --- --- Q: Roughly, how many ideographs are in modern use in extensions A and B? A: [Kenneth Whistler] Not many. I'll refer to the IRG experts to make a guess there. A: [Thomas Chan] Recently you asked about estimates of usage of Plane 2 characters--since a large percentage are CNS 11643-1992 characters (and perhaps the oldest IT source), that may provide a clue. In the "Concluding Remarks" section of Christian Wittern's "Taming the Masses"[1], the higher CNS planes (ignore 1 and 2, which are in the BMP, and perhaps some parts of 3) are rarely used in historic texts, and he expects even lower usage in modern texts. [1] <http://www.gwdg.de/~cwitter/cw/taming.html>. --- --- --- --- Q: Roughly, when will version 3.2 become official? A: [Kenneth Whistler] March, 2002. --- --- --- --- Q: Roughly, when will the version 4 book be published? A: [Kenneth Whistler] Currently still scheduled for March, 2003, but schedule slip is always a possibility on a major publication project like this. --- --- --- --- Q: When was ASCII first published and by whom? A: [Kenneth Whistler] 1967. By ANSI X3.4. Actually, that was preceded by ASCII per se, the earliest form of which was published as a standard in 1963 by ASA (American Standards Association -- the predecessor to ANSI). But the 1963 version of ASCII had some differences from what we now know as ASCII. A: [Nelson H. F. Beebe] That was about 1964 (a few months AFTER IBM System/360 was announced: that delay is reason we suffered the EBCDIC/ASCII mess for over 30 years). The best source for such early information is this book: @String{pub-AW = "Ad{\-d}i{\-s}on-Wes{\-l}ey"} @String{pub-AW:adr = "Reading, MA, USA"} @Book{Mackenzie:CCS80, author = "Charles E. Mackenzie", title = "Coded Character Sets: History and Development", publisher = pub-AW, address = pub-AW:adr, pages = "xxi + 513", year = "1980", ISBN = "0-201-14460-3", LCCN = "QA268 .M27 1980", bibdate = "Wed Dec 15 10:38:43 1993", price = "US\$24.95", series = "The Systems Programming Series",} I checked my copy, and found references on pp. 423ff to ASCII-63 (``When ASCII became an approved American standard in 1963, it was not complete.''), and to ASCII-65, ASCII-67, and USASCII-8 (1964). A: [John G. Otto] ANSI 1960s (I'm thinking 1964). A: [Otto Stolz] some of your questions probalbly are answered in Roman Czyborra's WWW pages, particularly in <http://czyborra.com/unicode/standard.html>, <http://czyborra.com/charsets/iso646.html>, <http://czyborra.com/charsets/iso8859.html>, <http://czyborra.com/charsets/cjk.html>, <http://czyborra.com/charsets/codepages.html>. --- --- --- --- Q: What standard was current before ASCII? (BAUDOT, is it?) How many bits did it use? A: [Doug Ewell] Before ASCII there was a wide variety of different encoding standards. Many were designed on the basis of punched card codes or the character repertoires on printer chains, and many did not seem to be "designed" at all, but just thrown together. What was great about ASCII was that it was the first encoding to be anywhere near "universal," even within the United States. You may head that EBCDIC predated ASCII, but that is only partially true. ASCII, being designed as a national standard from the outset, went through years of ballotting and committee haggling. EBCDIC, designed by IBM, went into production without comparatively little delay. That accounts for earlier widespread usage of EBCDIC, but in fact the two were developed concurrently. Some of the more popular character encodings that existed before ASCII were FIELDATA, PTTC, and BCDIC (the 6-bit predecessor to EBCDIC). The links to Roman Czyborra's and Dik Winter's Web sites are valuable. Follow them, if you have not already done so. Also, there is a book, "Coded Character Sets, History and Development," by Charles E. Mackenzie, that goes into extraordinary detail about these early character sets. The book is from 1980 and is out of print; I had to pay Amazon USD 66 for a slightly used copy. But you may find it in a technical library. Another good source for information on early character sets is Frank da Cruz, "Mr. Kermit." A: [Frank da Cruz] I don't have a lot to add to what's been said other than what's already published in character sets chapter the C-Kermit book: <http://www.columbia.edu/kermit/ck60manual.html>, in which the main items of interest are some early Russian sets like DKOI and KOI-7, and the original KOI-8, that we learned about when we visited the USSR in 1989. An extremely detailed and thorough history of ASCII and EBCDIC can be found in the Mackenzie book. For the full reference see number 52 in the References section of: <http://www.columbia.edu/acis/history/>. A: [John G. Otto] The Baudot technique was to, rather than have separate code points for lower-case & capitals, to have a shift/un-shift character. The last I saw it used was on S-100 based micro-computers that generally used 8-bit ASCII, but were driving old Teletype machines. A: [Murray Sargent] Btw, I didn't see anyone comment on BCD, which preceeded EBCDIC and ASCII and was the first encoding that I used back in 1962 on the IBM 709. It was different from the old Hollerith stuff, but I don't remember the details and I couldn't find it documented on the Internet. If you find out the encoding, I'd like to see it for old times' sake. Alternatively I could ask some of my old pals if they still have documentation kicking around somewhere. A: [Alistair Vining] I just found: <http://www.cwi.nl/~dik/english/codes/stand.html>, whose author (Dik Winter) notes that he 'stop[s] approximately where Roman Czyborra starts'. Thai EBCDIC, JISCII, 6-bit ISO codes, ASCII-1963 etc. Looks very thorough to me, but I wasn't there... --- --- --- --- Q: Did the ASCII standard expire, and when? A: [Rick McGowan] It has not "expired". It is balloted for maintenance every 5 years, and continues to be re-affirmed. A: [Kenneth Whistler] No, it is still a standard. A: [John G. Otto] Not that I can tell. --- --- --- --- Q: When was ISO 646 published? A: [Kenneth Whistler] 1972. A: [Nelson H. F. Beebe] >From the bibliography of this book @String{pub-DP = "Digital Press"} @String{pub-DP:adr = "12 Crosby Drive, Bedford, MA 01730, USA"} @Book{daCruz:1997:UCK, author = "Frank {da Cruz} and Christine M. Gianone", title = "Using {C-Kermit}", publisher = pub-DP, address = pub-DP:adr, edition = "Second", pages = "xxii + 662", year = "1997", ISBN = "1-55558-164-1", LCCN = "TK5105.9.D33 1997", bibdate = "Thu Jan 13 14:33:16 2000",} ISO 8859 is dated 1987--1995, and ISO 646 is dated 1983. I have an entry for the latter, but not the former, in my bibliography at <http://www.math.utah.edu/pub/tex/bib/index-table-i.html#isostd>, <ftp://ftp.math.utah.edu/pub/tex/bib/isostd.bib>. It reads: @Book{ISO:1983:ISB, author = "{International Organization for Standardization}", title = "{ISO Standard 646}, 7-Bit Coded Character Set for Information Processing Interchange", publisher = pub-ISO, address = pub-ISO:adr, edition = "Second", year = "1983", ISBN = "????", LCCN = "????", bibdate = "Mon Feb 05 17:48:01 2001", note = "Also available as ECMA-6.", URL = "http://www.iso.ch/cate/d4777.html", acknowledgement = ack-nhfb,} The preamble to the bibliography file gives Web address for ISO, which should suffice to track down the exact references; I'll certainly have to do that to fill the 8859 hole! --- --- --- --- Q: I think that ISO 646 expired. When? A: [Kenneth Whistler] No, it is still a standard. The current version is the ISO-646-IRV, revised in 1991. --- --- --- --- Q: When was ISO 8859 published? A: [Kenneth Whistler] It comes in many parts, each of which has a separate publication date. A: [Tim Greenwood] The above paper {"International Character Sets - the 7/8 bit story"} has it that the ECMA standard was approved in December 1984 and that ISO and ANSI were approving it as the paper was written in early 1985. --- --- --- --- Q: When did the first double-byte encoding appear? A: [John G. Otto] At least the late 1960s. Control Data used a 6-bit byte on their 60-bit word machines designed by Cray, but to get lower-case & a few more special characters, they would use 6/12 in which a couple characters (caret ^ and at @, IIRC, were borrowed to act to modify the next 6-bits to be lower-case, etc.). --- --- --- --- Q: Are OpenType fonts currently implemented in any platform other than Windows? A: [John H. Jenkins] OpenType fonts work without modification on Mac OS X, in that the glyphs can be displayed. Any Mac application can access the OT data in the font, parse it, and process it appropriately using public functions. The one piece still missing is automatic support for OT layout data in the system. A: [Eric Muller] FreeType implements OpenType, including layout. By construction, FreeType only requires an ANSI C implementation, and was written with embedded systems in mind. Thus, the answer to your question could be "all". A: [John Hudson] 'OpenType support' means a number of different things. Support for the font file format and rasterisation of the TT or CFF outlines is widespread, including Windows, OSX (native), earlier Mac systems (CFF only, using ATM), and implementations of FreeType. Support for individual OpenType Layout typographic features varies from application to application. Support for script shaping features and character-level pre-formatting, e.g. for Indic scripts, is supported in Windows apps that use Uniscribe for text processing, and I believe the FreeType developers have also been working on Indic shaping although I am not sure if this has been released yet. A: [Alan Wood] Yes. Apple supplies 4 Japanese OpenType fonts with Mac OS X - Hiragino Kaku Gothic Pro, Hiragino Kaku Gothic Std, Hiragino Maru Gothic Pro and Hiragino Mincho Pro. Adobe supplies TektonPro with InDesign 1.5 for Mac OS 9. --- --- --- ---

