On Wed, 1 Jan 2020 20:11:04 +0000 James Kass via Unicode <unicode@unicode.org> wrote:
> On 2020-01-01 11:17 AM, Richard Wordingham via Unicode wrote: > > > That's exactly the sort of mess that jack-booted renderers are > > trying to minimise. Their principle is that there should be only > > one encoding per shape, though to be fair: > > > > 1) some renderers accept canonical equivalents. > > 2) tolerance may be allowed for ligating (ZWJ, ZWNJ, CGJ), > > collating (CGJ, SHY) and line-breaking controls (SHY, ZWSP, WJ). > > 3) Superseded chillu encodings are still supported. > > There was never any need for atomic chillu form characters. > The > principle of only one encoding per shape is best achieved when every > shape gets an atomic encoding. I should have written per-word shape. I should also have added that most renderers attempt to handle Mongolian, despite its encoding Middle Mongolian phonetics rather than characters. Also, they don't attempt to sort the Arabic script per-language subsets out, which leads to a bad mess at Wiktionary when Unicode characters differ only in a few forms. > Glyph-based encoding is incompatible > with Unicode character encoding principles. Visual encoding sometimes works - phonetic order for Thai is so complicated that it is unsurprising that its definition is partly missing from Unicode 1.0. The official history hides behind incompatibility with the Thai national standard, but phonetic order was simply too complicated for Thai. Additionally, Thais don't agree on where preposed vowels go relative to Pali consonant clusters - they don't agree that all of them should appear in the middle of the cluster. (I suppose the positioning rule could have been made a stylistic feature of fonts.) An analogue is Lao collation. While syllable boundaries can overwhelmingly be discerned in modern Lao, Lao collations are too complicated to be accepted for ICU if they are to support anything but single syllables. CLDR collation (interpreted as a specification with the normal use of specification language for the form of definitions) can just cope, whereas the UCA can't, but the tables are huge. Richard.