As promised, here are some questions on the encoding of Mongolian that have arisen whilst writing an input method for the Mongolian script (the questions are relevant to the Todo, Manchu and Sibe scripts as well, but I'll restrict myself to Mongolian for the moment). I don't know if anyone is able to answer all of my questions, but I hope that someone on the list will be able to give me some much needed advice.
1. Documentation Section 11.4 of the Unicode Standard notes that a group of experts from Mongolia, China and the West are to publish a document called "User's Convention for System Implementation of the International Standard on Mongolian Encoding" which will explicitly define Mongolian character shaping behaviour in full. WG2 document N1980 (http://std.dkuug.dk/jtc1/sc2/WG2/docs/n1980.doc) also states that Mongolian, Chinese and English versions of the "User's Convention" will be prepared by Mongolia and China. I have been unable to locate this document on the internet. Does it exist, and if so can it be made publicly available ? Without the aid of such a document it seems almost impossible to correctly implement the Unicode encoding of Mongolian. In its stead I have been using the document "Traditional Mongolian Script in the ISO/IEC 19646 and Unicode Standards" (UNU/IIST Report No. 170, August 1999) written by Myatav Erdenechimeg, Richard Moore and Yumbayar Namsrai as a guide to Mongolian character shaping behaviour. It seems to provide all the information I would expect to see in the "User's Convention", but I am not sure how authoritive this paper is, and what its relationship is to the "User's Convention" (if any). 2. Free Variation Selectors The Mongolian Free Variation Selectors (U+180B, U+180C and U+180D) are used to distinguish variant graphic forms of the same positional forms of a character. I would say that there are three cataegories of variant forms governed by the variation selectors : A. Non-contextual variants, such as variant forms of letters that are used in foreign words (e.g. the use of a "reclining" letter D -- U+1833 + FVS1 -- in foreign words), and graphic variations that are due to differences between traditional and modern orthography. Such variants must be explicitly encoded by use of the appropriate variation selector in order for the correct form to be selected by the rendering engine. B. Contextual variants that are determined by the overall composition of the word in which they are found, such as the use of the long-toothed forms of the letters OE and UE (U+1825/1826 + FVS1) in the first syllable of a word only, or the use of the feminine form of the letter G (U+182D + FVS3) between consonants or the letter I (which is neutral) in a feminine word. In these cases I would imagine that it is too much to ask the rendering engine to work out the correct variant form, and the correct variant should be explicitly encoded using the appropriate variation selector. C. Contextual variants that can be determined from their neighbouring letters, such as the medial form of the letter G with two dots that is used before a vowel (U+182D + FVS2), or the form of the letter A that is written with a forward tail when occuring finally after the letters B, P, F and K (U+1820 + FVS1). In these cases is it necessary to explicitly encode the variant form with the appropriate variation selector ? The Standard says "For cases in which the contextual sequence of basic letters is not sufficient for a rendering engine to uniquely determine the appropriate glyph for a particular letter, additional format characters are provided so that the typist may specify the desired rendering". Should we assume that the rendering engine will correctly select the dotted form of medial G before a vowel and the dotless form before a consonant, or would it be wiser to explicitly encode the appropriate variation selector anyway ? 3. Mongolian Vowel Selector The Mongolian Vowel Selector (U+180E) is used to separate the vowels A and E from certain preceding consonants (e.g. ...N + MVS + A = U+1828,180E,1820 ). After MVS the vowels A and E use the forward tail variant which is physically offset from the preceding consonant by narrow whitespace. These variant forms of A and E are selected by the presence of a preceding MVS, and there appears to be no need to to otherwise select the variant A or E by means of a variation selector. However, not only does the MVS affect the following A or E, but the preceding consonant may also take a variant form when followed by an offset A or E. This is the case for the letters N, Q, G, J, Y and W. The variant forms of these letters when preceding an offset A or E are given in Unicode's Standardized Variants document (N, Q, G, J and Y are given as medial variants, but W is given as a final variant which is perhaps wrong). My question is, should the variant form of the consonant preceding the offset A or E be explicitly encoded using the appropriate variation selector, or is the presence of the following MVS sufficient for the rendering engine to select the correct variant form ? 4. Variant forms of the Mongolian Birga Appendix A of "Traditional Mongolian Script in the ISO/IEC 19646 and Unicode Standards" lists four variant forms of the Mongolian Birga (U+1800) : 1st variant form = U+1800 + FVS1 2nd variant form = U+1800 + FVS2 3rd variant form = U+1800 + FVS3 4th variant form = U+1800 + ZWJ Unicode's Standardized Variants document (http://www.unicode.org/Public/UNIDATA/StandardizedVariants.html) does not list any variants for the Mongolian Birga. Moreover, it warns "All combinations not listed here are unspecified and are reserved for future standardization; no conformant process may interpret them as standardized variants." This clearly means that these Birga variants should not currently be recognised. But given that the Birga does occur in a number of forms, either Unicode should define standardized variants for them, or add some new characters to represent them. Nevertheless, assuming that Appendix A of "Traditional Mongolian Script" is correct in providing a mechanism for distinguishing four variant forms of the Mongolian Birga, is it acceptable to use the ZWJ as a variant selector (as is the case for the 4th variant Birga) ? It's usage here seems a little suspect to me. Andrew

