On 08/11/2004 01:28, Michael Everson wrote:
No, my desire is that informative, explanatory text should not give misinformation and obfuscation. Note that I was not actually arguing for character encodings to follow this informative text, but for the text to be changed to match the reality of character encodings and the apparent decision of the UTC to encode scripts on the basis of significant nodes rather than semantic distinctions. And I was pointing out that many people, including an important group of Semitists, who (at least from your perspective) have misunderstood the situation are only following what they have read in informative, explanatory text in the Standard.At 22:45 +0000 2004-11-07, Peter Kirk wrote:
You have indeed stated an intention to encode "significant nodes".
Yes. Based on the scholarly taxonomy of writing systems.
But the official documentation, the Unicode Standard, does not say anything like this.
Alarm! Alarm! I detect a desire on your part to consider informative, explanatory text as normative.
Rather, it states that Unicode encodes "Characters, Not Glyphs", and that "Characters are the abstract representations of the smallest components of written language that have SEMANTIC value" (TUS section 2.2 p.15, my emphasis on "SEMANTIC").
Yes. ARABIC LETTER SHEEN is a different letter, and a different character from SYRIAC LETTER SHIN. DEVANAGARI LETTER KA is a different letter, and a different character, from ORIYA LETTER KA. PHOENICIAN LETTER NUN is a different letter, and a different character, from HEBREW LETTER NUN.
The first two, yes by definition, because they are in the Standard. The last one, only provisionally because it is subject to an ISO ballot.
And, Michael, I think you have agreed with me, and so with many scholars of Semitic languages, that the distinction between corresponding Phoenician and Hebrew letters (like that between corresponding Devanagari and Gujarati letters) is not a semantic one.
LETTERS differ by semantics. SCRIPTS differ by other criteria WHETHER OR NOT TEXT AFFIRMING THIS HAS BEEN WRITTEN INTO THE UNICODE STANDARD YET.
Well, if what the Standard says is different from this, you can hardly be surprised that people are confused. One national representative on WG2 wrote to me offlist (with a copy to you, Michael, and several others) suggesting that I was doing something morally wrong in rejecting a semantic distinction between Hebrew and Phoenician. I replied telling him that you too consider the distinction not semantic. I haven't heard any more from him.
The conclusion we reach from reading the Standard is that these distinctions are glyph distinctions and so should not be encoded.
You're wrong. You ignore the historical node-based distinctions which differentiate the Indic scripts one from the other, and which apply equally to Phoenician and Hebrew. And no, Fraktur and S�tterlin are not the same sort of thing.
The Standard ignores the historical node-based distinctions. I was trying to follow the Standard.
If it is indeed the position of the UTC that corresponding characters in "significant node" scripts should be encoded despite the lack of semantic distinctiveness,
This is YOUR requisite.
Is this the position of the UTC? Or does the UTC hold that your "significant node" scripts are semantically distinct, although you disagree? Or does the UTC not in fact accept your principle that "significant node" scripts should be encoded, despite their decision on Phoenician? Perhaps this should be clarified first.
I would like to suggest an amendment to the standard to make this principle clear. This would of course have to be agreed with WG2. Until such an amendment has been put in place, there will continue to be opposition to encoding of any new scripts which do not show clear semantic distinctiveness and so appear to be in breach of the principles of the Standard.
You're mistaken in your application of the concept of "semantic distinctiveness" with regard to script identity.
Well, I thought we were agreed on at least one thing, that the distinction between Phoenician and Hebrew should not be described as "semantic distinctiveness". And since, according to an informative part of the Standard, p.15, "semantic value" is the only criterion for a distinct character, it is hardly surprising that people are confused.
-- Peter Kirk [EMAIL PROTECTED] (personal) [EMAIL PROTECTED] (work) http://www.qaya.org/

