I've just realised that Robert's postings to the Unicode list are not getting through, and so I'm forwarding the original message which I only excerpted in my reply yesterday.
------- Start of forwarded message ------- From: "Robert R. Chilton" <[EMAIL PROTECTED]> Date: Sat, 04 Jan 2003 00:13:45 -0500 Cc: [EMAIL PROTECTED], [EMAIL PROTECTED], [EMAIL PROTECTED] Subject: Re: PRC asking for 956 precomposed Tibetan characters To: "Andrew C. West" <[EMAIL PROTECTED]> Andrew C. West wrote: > > ... > > Nevertheless, whether the Chinese proposal fails to include certain > transliteration letters or obscure Sanskrit-usage stacks or special letters used > for writing Dzongkha (although as far as I know Dzongkha is just a dialect of > Tibetan - or a separate language for political reasons - and written Dzongkha is > much the same as written Tibetan ... no doubt someone will correct me on this) > is largely irrelevant. The proposal could easily be expanded to include the > non-PRC usage letters, or a separate "Extended Brdarten" block could be > proposed. The key point is that the existing Tibetan encoding model works just > fine for all varieties of Tibetan, and there is simply no need for precomposed > Tibetan characters. I agree that the main objection to n2558 is that it is simply unnecessary; the existing Tibetan encoding model is not only sufficient but enables a far greater range of Tibetan-script orthography than the character set proposed in n2558. Moreover, for the authors of n2558 to argue that a non-combining model of Tibetan is necessary for compatibility with "traditional education, publication and electronic desktop publishing systems" to is to entirely discount the use of other complex scripts --such as the Indic scripts which employ a combining model-- in such "systems". Clearly, the direction of such a rationale runs entirely opposite to the basic principles of Unicode/ISO-10646. > I've posted my analysis of document n2558, together with a table mapping the > proposed glyphs to existing Unicode sequences, at > <a href="http://mail.alumni.princeton.edu/jump/http://uk.geocities.com/babelstone1357/Tibetan/brdarten.html">http://uk.geocities.com/babelstone1357/Tibetan/brdarten.html</a> Although I have not yet had time to check through Andrew's table mapping the proposed glyphs in n2558 to existing Unicode sequences, I can respond to his observations, below. > These are my main observations : > > 1. The proposal includes a single, apparently arbitrary, example of a consonant > plus triple E vowel (Glyph 107) that is found only in Tibetan shorthand > abbreviations, but many other consonant plus multiple vowel sign shorthand > abbreviations that are frequently encountered in prayer flags and elsewhere are > not covered by this proposal. (See > <a href="http://mail.alumni.princeton.edu/jump/http://uk.geocities.com/babelstone1357/Tibetan/shorthand.html">http://uk.geocities.com/babelstone1357/Tibetan/shorthand.html</a> for some > illustrated examples of shorthand abbreviations.) Such cases of triple (or quadruple) vowels E or O are best normalized to double vowel plus single (or double) vowel to aid in collation and other character data processing functions. Thus, Glyph 107 is best encoded as (or normalized to) <U+0F41, U+0FB1, U+0F7B, U+0F7A>. > 2. The proposal includes two examples of letters (KA and KHA) with a superfixed > TIBETAN SIGN LCE TSA CAN [U+0F88] (Glyphs 029 and 100). This sign is most > commonly used in Kalachakra literature, and there are presumably other instances > of its usage combined with different letters that are not covered by this > proposal. I'm not entirely sure how these glyphs should be encoded using the > existing Unicode character encoding model - I assume that the sign LCE TSA CAN > [U+0F88] should be encoded immediately following the base consonant with which > it is associated (i.e. <U+0F40, U+0F88> for Glyph 029 and <U+0F41, U+0F88> for > Glyph 100). Please correct me if I'm wrong. > > 3. The proposal includes two examples of letters (PA and PHA) with a superfixed > TIBETAN MARK PALUTA [U+0F85] (Glyphs 435 and Glyph 486). Presumably there are > other instances of its usage combined with different letters that are not > covered by this proposal. Again I'm not entirely sure how these glyphs should be > encoded using the existing Unicode character encoding model - I assume that the > paluta [U+0F85] should be encoded immediately following the base consonant with > which it is associated (i.e. <U+0F54, U+0F85> for Glyph 435 and <U+0F55, U+0F85> > for Glyph 486). Please correct me if I'm wrong. Assuming that there have been no changes in the combining classes of these characters since Unicode 3.0, the 2 characters <U+0F88> and <U+0F89> are spacing, non-combining characters. Therefore, the only possible encoding that will place the "base consonant" under these signs (i.e., will result in these signs being "superfixed" to the letters KA, KHA, PA, PHA, etal.) is for these characters to appear in the data stream just prior to the "base consonant", such base consonant being encoded in subjoined position. [It is not really correct to say that "The Unicode Standard does not explicitly specify the coding sequence for letters that are combined with any of the transliteration characters U+0F88 through U+0F8B" since the combining class of the characters is determinative.] Thus, to encode Glyphs 029 and 100 use <U+0F88, U+0F90> and <U+0F88, U+0F91>, respectively. Likewise, to encode Glyphs 435 and 486 use <U+0F89, U+0FA4> and <U+0F89, U+0FA5>, respectively. Note that these latter two glyphs are *NOT* a case of superfixed TIBETAN MARK PALUTA but rather a case of superfixed TIBETAN SIGN MCHU CAN. The PALUTA has a different function (of transliterating the Sanskrit apostrophe in Tibetan script) and is not found in superfixed position. [Note also that a naive reader might mistake the TIBETAN SIGN MCHU CAN for a superfixed NYA, just as one might confuse the NYA and the PALUTA.] > 4. Glyph 687 [Tibetan BrdaRten Character ZHA], Glyph 698 [Tibetan BrdaRten > Character ZA] and Glyph 713 [Tibetan BrdaRten Character AHA] in the proposal are > respectively the letters ZHA [U+0F5E], ZA [U+0F5F] and -A [U+0F60] with a dot > slightly right of centre over the top of the letter. I do not recognise this > dot-like mark, and the names given in Document N2558 do not explain what it > signifies. Can anyone enlighten me ? Though I confess that I am not familiar with these orthographies, the glyphs cited are cases of TIBETAN MARK TSA -PHRU [U+0F39] being affixed to letters ZHA, ZA, and -A, respectively. They would be encoded as <U+0F5E, U+0F39>, <U+0F5F, U+0F39> and <U+0F60, U+0F39>. I hope this is useful. New Year's greetings to all, Robert Chilton Technical Director The Asian Classics Input Project ------- End of forwarded message -------

