On Thu, 19 Jan 2017 14:25:14 -0800 Asmus Freytag <[email protected]> wrote:
> Now I'm thinking your focus was more on cases the like two Khmer > subjoined consonant sequences: > U+17D2 U+178A ្ដ KHMER CONSONANT SIGN COENG DA > U+17D2 U+178F ្ត KHMER CONSONANT SIGN COENG TA > that apparently have identical appearance, even though one is a 'd' > and the other a 't'. (That's the only example that I'm personally > familiar with). > Unless some fonts ever make a distinction, this seems to be a case > where "miscoding" might be an appropriate term. As far as the user is > concerned, the issue only arises because of the encoding scheme used. > (A hypothetical different scheme that had one of these precomposed > with a name containing something like DA OR TA would have not > surfaced an invisible distinction). Such a font might be KHOM2004 mentioned by Michel Antelme in his paper aefek.free.fr/iso_album/antelme_bis.pdf. On p25 he makes the point that a distinct COENG DA was still on its last legs in Cambodia in the 1920's; it's still distinct in the Khom variety of the script. This situation makes a good case for the Tibetan model. We might end up making the Khmer script a mixed system like Tai Tham by adding a character KHMER CONSONANT SIGN ARCHAIC COENG DA. There seem to be some Arabic script analogues, where only one or two forms differ between a pair of letters. This is not the situation I was interested in, but it's clearly related. > Are your examples likewise legitimate duplications or merely the case > that one could type something else and have it look the same > (accidentally). They're mostly legitimate duplications, though some may stretch phonological credulity. For example, in Tai Tham, <NA, SAKOT, HIGH TA, SIGN I> is part of a common Pali verb inflection and <NA, SIGN I, SAKOT, HIGH TA> is a valid Northern Thai word (apparently not a Pali loan, despite its spelling), but <MA, SAKOT, HIGH TA, SIGN I> would probably be a miscoding of <MA, SIGN I, SAKOT, HIGH TA> (an attested final syllable) if the language were Northern Thai. I suppose it's just conceivable that the former might be the name of a fruit, but I'm not aware of the syllabic nasal being written that way. A spell checker would pick up most such errors, though getting the underlying problem explained to the user might be difficult. > The Khmer example would seem fairly resistant to automated correction > if it is a free choice. If, instead, the immediately preceding > consonant comes from two disjoined sets, for example if TA COENG TA > was possible, but not TA COENG DA, then there's scope for spell check. It's supposed to be based on the phonetics, so a spell check could be used, but not a grammar rule. However, I can imagine someone writing in accordance with a rule restricting them to certain bases. Richard.

