Re: Misspelling or Miscoding?

Asmus Freytag Thu, 19 Jan 2017 18:46:27 -0800

On 1/19/2017 5:04 PM, Richard Wordingham wrote:

On Thu, 19 Jan 2017 14:25:14 -0800
Asmus Freytag <[email protected]> wrote:

Now I'm thinking your focus was more on cases the like two Khmer
subjoined consonant sequences:
U+17D2 U+178A     ្ដ         KHMER CONSONANT SIGN COENG DA
U+17D2 U+178F     ្ត         KHMER CONSONANT SIGN COENG TA
that apparently have identical appearance, even though one is a 'd'
and the other a 't'. (That's the only example that I'm personally
familiar with).
Unless some fonts ever make a distinction, this seems to be a case
where "miscoding" might be an appropriate term. As far as the user is
concerned, the issue only arises because of the encoding scheme used.
(A hypothetical different scheme that had one of these precomposed
with a name containing something like DA OR TA would have not
surfaced an invisible distinction).

Such a font might be KHOM2004 mentioned by Michel Antelme in his paper
aefek.free.fr/iso_album/antelme_bis.pdf.  On p25 he makes the point
that a distinct COENG DA was still on its last legs in Cambodia in the
1920's; it's still distinct in the Khom variety of the script.  This
situation makes a good case for the Tibetan model.  We might end up
making the Khmer script a mixed system like Tai Tham by adding a
character KHMER CONSONANT SIGN ARCHAIC COENG DA.

There seem to be some Arabic script analogues, where only one or two
forms differ between a pair of letters.

Yes, and these are treated similarly to the Khmer case in labelgeneration rulesets for domain names.


This is not the situation I was interested in, but it's clearly related.

Funny thing is, not actually knowing Khmer, I hadn't thought of theCOENG DA as a "form of DA", but had considered the sequence it's own entity.

In Latin you have to characters that look like reverse e but havedifferent upper cases so that they have a distinct encoding. (You couldargue that picking the wrong member of a disunified set is a miscoding,but I think "misspelling" works fine -- in another context we limit theterm "misspelling" to phono-something or typo/grapho-something*possible* spellings, and try to not restrict them for that purpose. The"impossible" ones, are ones that we expect some font or renderer to notsupport on the basis that they are not needed, and those we do restrict;wouldn't use the name "miscoding" for those, just "invalid" does nicelyfor us in that context).

The case where something (=member of or associated with an alphabet) issimply and fully identical in appearance in all contexts (and I regardscript as a context) is fortunately quite rare in Unicode. Your examplesmay be the closest thing.

Are your examples likewise legitimate duplications or merely the case
that one could type something else and have it look the same
(accidentally).

They're mostly legitimate duplications, though some may stretch
phonological credulity.  For example, in Tai Tham, <NA, SAKOT, HIGH TA,
SIGN I> is part of a common Pali verb inflection and <NA, SIGN I, SAKOT,
HIGH TA> is a valid Northern Thai word (apparently not a Pali loan,
despite its spelling), but <MA, SAKOT, HIGH TA, SIGN I> would probably
be a miscoding of <MA, SIGN I, SAKOT, HIGH TA> (an attested final
syllable) if the language were Northern Thai.  I suppose
it's just conceivable that the former might be the name of a fruit, but
I'm not aware of the syllabic nasal being written that way.

A spell checker would pick up most such errors, though getting the
underlying problem explained to the user might be difficult.

The Khmer example would seem fairly resistant to automated correction
if it is a free choice. If, instead, the immediately preceding
consonant comes from two disjoined sets, for example if TA COENG TA
was possible, but not TA COENG DA, then there's scope for spell check.

It's supposed to be based on the phonetics, so a spell check could be
used, but not a grammar rule.  However, I can imagine someone writing
in accordance with a rule restricting them to certain bases.

Your last sentence reads as if you might equally well meant "can't"instead of "can" (?)

Having agreement in consonants or vowels across syllables or words isn'tnecessarily unheard of; spell checkers tend to go on the basis ofexisting lexical items, not necessarily purely productive rules. Atleast the ones I use for European languages have this annoying habit ofnot having a productive rule for compounds - even for languages that doallow arbitrary compound formation.


Anyway, digressing from your point.

A./

Re: Misspelling or Miscoding?

Reply via email to