From: "Michael Everson" <[EMAIL PROTECTED]> > At 06:35 -0700 2004-05-14, Peter Kirk wrote: > > >But there is an exceptional issue within the family of north-west > >Semitic scripts, which may apply also to others e.g. Greek, Coptic > >and archaic Greek - possibly also the Indic scripts. > > I don't think so. > > >Within these sets of scripts there is NO ambiguity about which > >characters correspond to which, as they have identical repertoires, > >with possibly additional letters in some of the scripts for which no > >equivalent can be defined in the other scripts. > > That doesn't mean that an ordered list with them interfiled is in any > way legible.
I do agree. UCA is first built to produce legible and consistent ordering for various kinds of readers, both experts or simple users that can only read one language or one script. We can interleave some variants that have an obvious relation with other wellknown characters (accented letters are good examples, even if some may wonder why there are thorn lettern between T and U; these letters being more rare even in the languages that use them, this inreleaving of variants does not make the ordering completely unreadable). For search purposes, what some want is not really a collation order but equivalence relations. This belongs to the same need as case folding, or case insensitive searches. I see no opposition in adding new types of string folding, for those that would like to "fold" (in fact transliterate) Phenician to Hebrew (the reverse being hard to implement consistently due to the various sets of Hebrew diacritics), or to Greek. There can even exist some standard guideline to implement such folding or transliteration (for the same reason that there does exist standard folding rules for case in Latin/Greek/Cyrillic or for Katagana-to-Hiragana in Japanese). Such folding belongs to the same area, with the same caveats (in terms of text interpretation), as custom normalizations or compatibility normalizations performed on unknown input text: a linguistic semantic is lost.

