"karl williamson" <[email protected]> wrote: > This discussion doesn't make sense to me. The original proposal to > encode 19DA says that there is one set of digits in New Tai Lue, but > there is an extra digit '1' (the one that got put at 19DA), used when > the other digit '1' is visually confusable with another character in the > script, which it resembles. That makes it sound like the two are > essentially used as glyph variants of each other, and are > interchangeable as far as the computer recognizing an input number.
Yes, the exception will work for recognizing this digit as an exception for INPUT, but you still have a problem for output, because your library will need to know when to output the variant : if you always use the default digit 1, you'll create a string that is possibly confusable to the reader, notably if it appears alone with no other digit. So you'll still need an exception to change one or several of these digits 1, to use the variant, or you'll decide to always use the variant (which causes no confusion), but I'm not sure that such use would be valid in the target language. There are possibly complex rules deciding when the variant is needed and accepted, or when the default variant is preferable and not confusable. For Arabic ther are clearly two separate sets of digits, but the possibility of mixing them arbitrarily is still a problem for IDNA (if both sets are accepted), notably because most digits (except 4 to 6) are completely identical. So registries will have to: - either accept one set and reject the other one - accept both, but only one within the same domain label, reserving also the label using the other set (as if they were canonically equivalent). Such equivalences (which are definitely not canonical) can be handled by tailored collation compares (operating at collation level 2 only, when non-IDN registries operate only at level 1), where IDN registries will use their own tailoring. I just see the IDN "StringPrep" as a particular application of the general concept of collation mappings (except that it was not designed on linguistic bases, but an IDN registry can be viewed as a locale for collation purposes). All these complex rules and mappings of IDN can be written in terms of a set collation rules, added on top of the DUCET.

