Andrew: A small group has been working on these and other questions for a while now, after the last group of questions raised on Mongolian on this list. I will get in contact with you separately with some of our work.
For the moment, in short: yes, use the TR170 document, especially its detailed examples (which are fuller than the textual explanations, and have implications not explicitly stated); there is a Chinese book called Mengguwen bianma which at parts is fuller and more explicit. There are still some rare cases not covered by either. Martin Heijdra ----- Original Message ----- From: "Andrew C. West" <[EMAIL PROTECTED]> To: <[EMAIL PROTECTED]> Sent: Monday, December 16, 2002 8:40 AM Subject: Mongolian Encoding > As promised, here are some questions on the encoding of Mongolian that have > arisen whilst writing an input method for the Mongolian script (the questions > are relevant to the Todo, Manchu and Sibe scripts as well, but I'll restrict > myself to Mongolian for the moment). I don't know if anyone is able to answer > all of my questions, but I hope that someone on the list will be able to give me > some much needed advice. > > 1. Documentation > Section 11.4 of the Unicode Standard notes that a group of experts from > Mongolia, China and the West are to publish a document called "User's Convention > for System Implementation of the International Standard on Mongolian Encoding" > which will explicitly define Mongolian character shaping behaviour in full. WG2 > document N1980 (http://std.dkuug.dk/jtc1/sc2/WG2/docs/n1980.doc) also states > that Mongolian, Chinese and English versions of the "User's Convention" will be > prepared by Mongolia and China. I have been unable to locate this document on > the internet. Does it exist, and if so can it be made publicly available ? > Without the aid of such a document it seems almost impossible to correctly > implement the Unicode encoding of Mongolian. > In its stead I have been using the document "Traditional Mongolian Script in the > ISO/IEC 19646 and Unicode Standards" (UNU/IIST Report No. 170, August 1999) > written by Myatav Erdenechimeg, Richard Moore and Yumbayar Namsrai as a guide to > Mongolian character shaping behaviour. It seems to provide all the information I > would expect to see in the "User's Convention", but I am not sure how > authoritive this paper is, and what its relationship is to the "User's > Convention" (if any). > > 2. Free Variation Selectors > The Mongolian Free Variation Selectors (U+180B, U+180C and U+180D) are used to > distinguish variant graphic forms of the same positional forms of a character. I > would say that there are three cataegories of variant forms governed by the > variation selectors : > A. Non-contextual variants, such as variant forms of letters that are used in > foreign words (e.g. the use of a "reclining" letter D -- U+1833 + FVS1 -- in > foreign words), and graphic variations that are due to differences between > traditional and modern orthography. Such variants must be explicitly encoded by > use of the appropriate variation selector in order for the correct form to be > selected by the rendering engine. > B. Contextual variants that are determined by the overall composition of the > word in which they are found, such as the use of the long-toothed forms of the > letters OE and UE (U+1825/1826 + FVS1) in the first syllable of a word only, or > the use of the feminine form of the letter G (U+182D + FVS3) between consonants > or the letter I (which is neutral) in a feminine word. In these cases I would > imagine that it is too much to ask the rendering engine to work out the correct > variant form, and the correct variant should be explicitly encoded using the > appropriate variation selector. > C. Contextual variants that can be determined from their neighbouring letters, > such as the medial form of the letter G with two dots that is used before a > vowel (U+182D + FVS2), or the form of the letter A that is written with a > forward tail when occuring finally after the letters B, P, F and K (U+1820 + > FVS1). In these cases is it necessary to explicitly encode the variant form with > the appropriate variation selector ? The Standard says "For cases in which the > contextual sequence of basic letters is not sufficient for a rendering engine to > uniquely determine the appropriate glyph for a particular letter, additional > format characters are provided so that the typist may specify the desired > rendering". Should we assume that the rendering engine will correctly select the > dotted form of medial G before a vowel and the dotless form before a consonant, > or would it be wiser to explicitly encode the appropriate variation selector > anyway ? > > 3. Mongolian Vowel Selector > The Mongolian Vowel Selector (U+180E) is used to separate the vowels A and E > from certain preceding consonants (e.g. ...N + MVS + A = U+1828,180E,1820 ). > After MVS the vowels A and E use the forward tail variant which is physically > offset from the preceding consonant by narrow whitespace. These variant forms of > A and E are selected by the presence of a preceding MVS, and there appears to be > no need to to otherwise select the variant A or E by means of a variation > selector. > However, not only does the MVS affect the following A or E, but the preceding > consonant may also take a variant form when followed by an offset A or E. This > is the case for the letters N, Q, G, J, Y and W. The variant forms of these > letters when preceding an offset A or E are given in Unicode's Standardized > Variants document (N, Q, G, J and Y are given as medial variants, but W is given > as a final variant which is perhaps wrong). My question is, should the variant > form of the consonant preceding the offset A or E be explicitly encoded using > the appropriate variation selector, or is the presence of the following MVS > sufficient for the rendering engine to select the correct variant form ? > > 4. Variant forms of the Mongolian Birga > Appendix A of "Traditional Mongolian Script in the ISO/IEC 19646 and Unicode > Standards" lists four variant forms of the Mongolian Birga (U+1800) : > 1st variant form = U+1800 + FVS1 > 2nd variant form = U+1800 + FVS2 > 3rd variant form = U+1800 + FVS3 > 4th variant form = U+1800 + ZWJ > > Unicode's Standardized Variants document > (http://www.unicode.org/Public/UNIDATA/StandardizedVariants.html) does not list > any variants for the Mongolian Birga. Moreover, it warns "All combinations not > listed here are unspecified and are reserved for future standardization; no > conformant process may interpret them as standardized variants." This clearly > means that these Birga variants should not currently be recognised. But given > that the Birga does occur in a number of forms, either Unicode should define standardized > variants for them, or add some new characters to represent them. > Nevertheless, assuming that Appendix A of "Traditional Mongolian Script" is > correct in providing a mechanism for distinguishing four variant forms of the > Mongolian Birga, is it acceptable to use the ZWJ as a variant selector (as is > the case for the 4th variant Birga) ? It's usage here seems a little suspect to > me. > > Andrew > >

