The WG2 home page was updated today to add a link to document N2507, "Draft of Proposal to add Latin characters required by Latinized Taiwanese Holo language to ISO/IEC 10646" [1], by a group called the Department of Language Education of National Taitung Teachers College. The document is dated either 2002-03-11 or 2002-03-31, depending on what part of the title page you look at.
This document proposes a COMBINING RIGHT DOT ABOVE for use in a popular Latin-script orthography of the Taiwanese Holo language. Some time ago (I can't look up exactly when because the unicode.org archives are unavailable), I wrote that this combining character should be added in lieu of a largish collection of precomposed characters. Ken Whistler responded that the issue had already been debated, and a solution already presented to use U+0307 COMBINING DOT ABOVE (possibly incorporating a Taiwanese font-specific glyph variation to move the dot to the right). Evidently the Taiwanese teachers did not consider this satisfactory, as they have responded with this new proposal to encode a separate COMBINING RIGHT DOT ABOVE. Whether this new combining character makes sense, however, the rest of the proposal clearly does not. The group has proposed no less than 42 precomposed Latin characters, all of which can be formed using existing Latin letters and combining marks (together with the proposed RIGHT DOT ABOVE). The 42 precomposed letters are proposed "to be added to Latin Extended-B," which is a puzzle to me since that block has only 25 available code positions as of Unicode 4.0. Much more troubling, however, is the fact that this group has apparently ignored or disregarded the Unicode/10646 policy against standardizing new precomposed letters that can be composed with existing characters. The document says: "The precomposed characters are proposed to ensure compatibility with the existing font "HoloWin" in the word-processing software HOTSYS widely employed in the user community. We have been promised composing characters in major (Microsoft etc.) implementations since 1997. Now, 5 years later, we still have nothing." Compatibility with 8-bit legacy fonts and software is *not* sufficient cause for encoding new precomposed characters. The WG2 "Principles and Procedures" document [2] specifically states that a precomposed character should not be encoded "if solely intended to overcome short-term deficiency of rendering technology." The Taiwanese document does not say which "major (Microsoft etc.) implementation" fails to support composition using combining marks, but as a previous thread on this list has shown, there is at least some support in Internet Explorer for such characters. Try this experiment: One of the precomposed characters proposed by the Taiwanese teachers is LATIN SMALL LETTER N WITH CIRCUMFLEX. Here it is, encoded properly as U+006E U+0302: n̂ Some of you will be able to see this character, others will not. Rendering technology is not perfect yet. But this is the correct way to create new accented letters in Unicode/10646, not by adding more precomposed characters. The proposal for a new COMBINING RIGHT DOT ABOVE may or may not have merit -- I'm not going to commit firmly to the idea that it does, like I did last time -- but the 42 precomposed letters have no business being encoded and should not be debated further. -Doug Ewell Fullerton, California -Doug Ewell Fullerton, California [1] http://std.dkuug.dk/jtc1/sc2/wg2/docs/n2507.pdf [2] http://std.dkuug.dk/JTC1/SC2/WG2/docs/n2352r.pdf

