Script variants and compatibility equivalence, was: Response to Everson Phoenician and why June 7?

Peter Kirk Fri, 04 Jun 2004 14:44:09 -0700

On 25/05/2004 12:14, Kenneth Whistler wrote:

Peter,
There is no consensus that this Phoenician proposal is necessary. I and others have also put forward several mediating positions e.g. separate encoding with compatibility decompositions

Which was rejected by Ken for good technical reasons.

I don't remember any technical reasons, it was more a matter of "we haven't done it this way before".

The *reason* why we haven't done it this way before is because it would cause technical difficulties.

I am revisiting this one because I realise now that Ken has been somewhat economical with the truth here. There ARE cases in which entire alphabets have been given compatibility decompositions to other alphabets. For example there are the Mathematical Alphanumeric Symbols, the Enclosed Alphanumerics, and the Fullwidth and Halfwidth Forms, as well as superscripts, subscripts, modifier letters etc. These symbols have these compatibility decompositions because they are not considered to form a separate script, but rather to be glyph variants of characters in Latin, Greek, Katakana etc script. Do these compatibility decompositions cause technical difficulties?

Compatibility decompositions directly impact normalization.

Of course. And the point of suggesting compatibility decomposition here is precisely so that compatibility normalisation, as well as default collation, folds together Phoenician and Hebrew variant glyphs of the same script.

Cross-script equivalencing is done by transliteration algorithms, not by normalization algorithms.

This begs the question. Scholars of Semitic languages do not accept that this is a cross-script issue. They do not accept that representation of a Phoenician, palaeo-Hebrew etc inscription with square Hebrew glyphs is transliteration. Rather, for them it is a matter of replacing an obsolete or non-standard glyph by a modern standard glyph for the same character - just as one would not describe as transliteration representation in Times New Roman of a Latin script text in mediaeval handwriting or in Fraktur.

If you try to blur the boundary between those two by introducing
compatibility decompositions to equate across separately encoded
scripts, the net impact would be to screw up *both* normalization
and transliteration by conflating the two. You
would end up with confusion among both the implementers of
such algorithms and the consumers of them.

I would suggest that a clear distinction should be made, in an appropriate part of the Unicode Standard, between transliteration (between separate scripts) and what one might call glyph normalisation (between variant forms of the same script).

But perhaps that is only because the need to do this has not previously been identified.
No, that is not the case.
However, I can make a good case for the new Coptic letters being made compatibility equivalent to Greek - which can still be done, presumably -

But will not be done. If you attempted to make your case, you would soon discover that even *if* such cross-script equivalencing via compatibility decompositions were a good idea (which it isn't), you would end up with inconsistencies, because some of the Coptic letters would have decompositions and some could not (because they are already in the standard without decompositions). You'd end up with a normalization nightmare (where some normalization forms would fold Coptic and Greek, and other normalization forms would not), while not having a transliteration solution.

This is not intended as a transliteration solution. It is intended to recognise that *some* Coptic letters are glyph variants of Greek letters, as previously recognised by the UTC, whereas *others* are not. As a result only the former set would have compatibility decompositions - and as it happens those are precisely the ones which are proposed for new encoding, and so for which compatibility decompositions can still be defined. This also has the major advantage that it folds together, for normalisation and default collation, texts which have been encoded according to the existing definitions for Coptic and those which will be encoded according to the new definitions.

But I accept that this Coptic to Greek compatibility has a few problems because not all characters have mappings. However, this is not a problem for Phoenician, because *every* Phoenician character has an unambiguous compatibility mapping to an existing Hebrew character.

... I don't like the notion of interleaving in the default weighting table, and have spoken against it, but as John Cowan has pointed out, it is at least feasible. It doesn't have the ridiculousness factor of the compatibility decomposition approach.

If what I have suggested is ridiculous, so is what the UTC has already defined for Mathematical Alphanumeric Symbols.

... The equivalencing of 22 Phoenician letters, one-to-one against Hebrew characters, where the mapping is completely known and uncontroversial, is a minor molehill.

Well, why not make these uncontroversial equivalents, between variant glyphs for the same script, compatibility decompositions?


--
Peter Kirk
[EMAIL PROTECTED] (personal)
[EMAIL PROTECTED] (work)
http://www.qaya.org/

Script variants and compatibility equivalence, was: Response to Everson Phoenician and why June 7?

Reply via email to