On Tue, 20 Jan 2004 00:36:54 -0800, Asmus Freytag wrote: > > Currently, Variation Selectors work only one way. You could 'force' one > particular > shape. Leaving the VS off, gives you no restriction, leaving the software free > to give you either shape. W/o defining the use of two VSs you cannot 'force' > the 'regular' shape.
Yes, I had forgotten this. Although in practice I would imagine that only the most perverse font would use an unexpected glyph variant as the standard glyph for a character. To go back to my simplistic example of the long s (which I hope no-one is taking too seriously), I think that the user would be justified in expecting an ordinary short s to be displayed for U+0073 in isolation, and I doubt that many fonts would map a long s glyph directly to U+0073. Thus although you cannot force the "regular" glyph shape you can force the font's default glyph shape by the omission of a VS, and in most fonts the default glyph would be the same as the "regular" Unicode code chart glyph. > Also, the way most VSs are defined, their use does not depend > on context the same way as the example suggests. > Absolutely. My understanding is that the Mongolian Free Variation Selectors (and the hypothetical long-s FVS) function quite differently from the ordinary variation selectors currently used for mathematical symbols, and proposed for Phags-pa, and apparently coming soon for Han ideographs. In the case of Mongolian the rendering system can determine the expected glyph form based on a set of deterministic rules, and so an FVS needs only be applied when the rules need to be broken. On the other hand, there are no rules that allow the rendering system to know which particular Standardised Variant glyph form to use for an unmarked Unicode character in a particular context, and the VS must be applied manually by the user or IME. My understanding of under what circumstances standard variation selectors are a good idea is typified by the four proposed Phags-pa standardised variants : A85B FE00 -- PHAGS-PA LETTER YA with rounded appearance A860 FE00 -- PHAGS-PA LETTER HA without tail kink A864 FE00 -- PHAGS-PA LETTER FA with tail kink A85E FE00 -- PHAGS-PA LETTER SHA with sloping stroke These are glyph variants of Phags-pa letters that are used with semantic distinctiveness in a single (but very important) text, _Menggu Ziyun_ , a 14th century rhyming dictionary of Chinese in which Chinese ideographs are listed by their Phags-pa spellings. In this one text only, variant forms of the letters FA, SHA, HA and YA are used contrastively in order to represent historical phonetic differences between Chinese syllables that were pronounced the same in early 14th century standard Chinese (Old Mandarin). For example : A. The ideographs SHU �� [U+66F8] and SHU �� [U+6B8A] were pronounced the same in Old Mandarin, but were historically distinct (in the Chinese of the Tang dynasty), the former with a reconstructed [U+0255] initial, the latter with a reconstructed [U+0291] initial. In _Menggu Ziyun_ the former SHU is spelled sheeu and the latter SHU spelled sh'eeu (where sh' is a glyph variant of sh). B. The ideographs YIN �� [U+56E0] and YIN �� [U+5BC5] were pronounced the same in Old Mandarin (other than tone which is not represented in Phags-pa spelling of Chinese), but were historically distinct, the former with a reconstructed null initial, the latter with a reconstructed [j] initial. In _Menggu Ziyun_ the former YIN is spelled yin and the latter YIN spelled y'in (where y' is a glyph variant of y). C. The ideographs XIAN � [U+96AA] and XIAN �� [U+5ACC] were pronounced the same in Old Mandarin (other than tone), but were historically distinct, the former with a reconstructed [x] initial, the latter with a reconstructed [U+0263] initial. In _Menggu Ziyun_ the former XIAN is spelled hyem and the latter XIAN spelled h'yem (where h' is a glyph variant of h). D. The ideographs FANG �� [U+65B9] and FANG �[ [U+623F] were pronounced the same in Old Mandarin (other than tone), but were historically distinct, the former with a reconstructed [p] initial, the latter with a reconstructed [b] initial. In _Menggu Ziyun_ the former FANG is spelled fang and the latter FANG spelled f'ang (where f' is a glyph variant of f). However, in actual Phags-pa manuscript/printed texts and epigraphic inscriptions there is no distinction between pairs of ideographs such as these, and the same glyph form is used for all occurences of the letters FA, SHA, HA and YA respectively. Thus the Phags-pa letters FA, SHA, HA and YA represent "f", "sh", "h" and "y" however they are written, but in one certain textual context glyph distinction is used to carry additional historic phonetic information that you may or may not want to preserve in electronic texts. As Asmus says, "A VS approach is potentially indicated when its necessary to manually select non-deterministic variants (or to override deterministic ones) and at the same time it's desired to use the same base character code to carry the same base meaning". I think that the proposed Phags-pa standardised variants exactly meet these criteria. Like others on this list I would like to hear more about the Han standardised variants, as I'm more than a little uneasy about the use of variation selectors to select simple glyph variants (e.g. traditional versus modern glyph forms) that have no semantic distinctions, as would seem to be the case from what John was saying. You can pretty much make the same case for needing to represent glyph forms in plain text for any script, especially if you're an epigrapher or textual scholar. What's so special about Han ideographs I wonder ? Andrew

