On 1/19/2017 5:04 PM, Richard Wordingham wrote:
On Thu, 19 Jan 2017 14:25:14 -0800
Asmus Freytag <[email protected]> wrote:

Now I'm thinking your focus was more on cases the like two Khmer
subjoined consonant sequences:
U+17D2 U+178A     ្ដ         KHMER CONSONANT SIGN COENG DA
U+17D2 U+178F     ្ត         KHMER CONSONANT SIGN COENG TA
that apparently have identical appearance, even though one is a 'd'
and the other a 't'. (That's the only example that I'm personally
familiar with).
Unless some fonts ever make a distinction, this seems to be a case
where "miscoding" might be an appropriate term. As far as the user is
concerned, the issue only arises because of the encoding scheme used.
(A hypothetical different scheme that had one of these precomposed
with a name containing something like DA OR TA would have not
surfaced an invisible distinction).
Such a font might be KHOM2004 mentioned by Michel Antelme in his paper
aefek.free.fr/iso_album/antelme_bis.pdf.  On p25 he makes the point
that a distinct COENG DA was still on its last legs in Cambodia in the
1920's; it's still distinct in the Khom variety of the script.  This
situation makes a good case for the Tibetan model.  We might end up
making the Khmer script a mixed system like Tai Tham by adding a
character KHMER CONSONANT SIGN ARCHAIC COENG DA.

There seem to be some Arabic script analogues, where only one or two
forms differ between a pair of letters.
Yes, and these are treated similarly to the Khmer case in label generation rulesets for domain names.

This is not the situation I was interested in, but it's clearly related.
Funny thing is, not actually knowing Khmer, I hadn't thought of the COENG DA as a "form of DA", but had considered the sequence it's own entity.

In Latin you have to characters that look like reverse e but have different upper cases so that they have a distinct encoding. (You could argue that picking the wrong member of a disunified set is a miscoding, but I think "misspelling" works fine -- in another context we limit the term "misspelling" to phono-something or typo/grapho-something *possible* spellings, and try to not restrict them for that purpose. The "impossible" ones, are ones that we expect some font or renderer to not support on the basis that they are not needed, and those we do restrict; wouldn't use the name "miscoding" for those, just "invalid" does nicely for us in that context).

The case where something (=member of or associated with an alphabet) is simply and fully identical in appearance in all contexts (and I regard script as a context) is fortunately quite rare in Unicode. Your examples may be the closest thing.

Are your examples likewise legitimate duplications or merely the case
that one could type something else and have it look the same
(accidentally).
They're mostly legitimate duplications, though some may stretch
phonological credulity.  For example, in Tai Tham, <NA, SAKOT, HIGH TA,
SIGN I> is part of a common Pali verb inflection and <NA, SIGN I, SAKOT,
HIGH TA> is a valid Northern Thai word (apparently not a Pali loan,
despite its spelling), but <MA, SAKOT, HIGH TA, SIGN I> would probably
be a miscoding of <MA, SIGN I, SAKOT, HIGH TA> (an attested final
syllable) if the language were Northern Thai.  I suppose
it's just conceivable that the former might be the name of a fruit, but
I'm not aware of the syllabic nasal being written that way.

A spell checker would pick up most such errors, though getting the
underlying problem explained to the user might be difficult.

The Khmer example would seem fairly resistant to automated correction
if it is a free choice. If, instead, the immediately preceding
consonant comes from two disjoined sets, for example if TA COENG TA
was possible, but not TA COENG DA, then there's scope for spell check.
It's supposed to be based on the phonetics, so a spell check could be
used, but not a grammar rule.  However, I can imagine someone writing
in accordance with a rule restricting them to certain bases.
Your last sentence reads as if you might equally well meant "can't" instead of "can" (?)

Having agreement in consonants or vowels across syllables or words isn't necessarily unheard of; spell checkers tend to go on the basis of existing lexical items, not necessarily purely productive rules. At least the ones I use for European languages have this annoying habit of not having a productive rule for compounds - even for languages that do allow arbitrary compound formation.

Anyway, digressing from your point.

A./

Reply via email to