On Fri, 13 Jan 2017 10:27:35 -0800 Asmus Freytag <[email protected]> wrote:
> This points to another interesting issue. A number of languages have > seen orthographic reforms that affect the use of complex scripts. > Now then, a decision: do you support both the old and the new style > in the same rule-set? If vestiges remain in general use, you may not > have a choice, but, what if the rules for old and new (or for > different languages in the same script) actually conflict? What we have seen in Khmer is a change that almost prohibits CVC orthographic clusters. (I don't count nikahits, visargas or fragments of vowels as C.) However, that is a rule of the language; it does not need to be a rule of the script. I am not sure that the old and new rules should conflict. We are presumably taking about a change made before the script was soundly encoded; it seems unreasonable that renderers should suddenly refuse to handle text that was previously valid. Now, I can think of a potential problem with Northern Thai ᨴᩘ᩠ᩃᩣ᩠ᨿ <U+1A34 TAI THAM LOW TA, U+1A58 MAI KANG LAI, U+1A60 SAKOT, U+1A43 LA, U+1A63 SIGN AA, U+1A60 SAKOT, U+1A3F LOW YA> 'all'. It is a single, chained orthographic syllable. This appears to be contrary to Tai Khün grammar, and is not clear to me how a modern Tai Khün font should render it. (It's also contrary to USE, but so is most of the language.) The problem is that U+1A58 is a final, spacing mark in Tai Khün, while further east it is a repha-like mark - it corresponds to kinzi in Burmese. The solution I anticipate is that it must be rendered as a non-spacing mark even in Tai Khün when it cannot be interpreted as a spacing mark. Has anyone handled this issue? My intended solution will allow a common sequence of code points for both the old style (U+1A58 as kinzi), the intermediate Northern Thai styles, and the new style (U+1A58 as a final consonant). > In the case that I cited, that combination of language/script was > taken as out of scope for other reasons; now, for general text, are > there situations where you'd want separate sets of rules for each > language? For determining which language a text might belong to, different rules would be appropriate. However, for deciding whether to render text, that seems ridiculous. Converting renderable multilingual text to plain text would make it unrenderable, which is surely undesirable. Having said that, there do appear to be potential problems in the Lanna script arising from interactions of spelling and layout style. In some styles, the consonant (and vowel) stack turns right at a certain depth, and therefore can reasonably contain more items that a strictly vertical stack. As both styles appear in material published in Chiang Mai, I'd be loath to declare different validity rules. I'd rather treat any problems as the surfacing of a renderer limitation. Richard.

