From: "Arcane Jill" <[EMAIL PROTECTED]> > Ignoring all compatibility characters; ignoring everything that has gone > before; and considering only present and future characters (that is, > characters currently under consideration for inclusion in Unicode, and > characters which will be under consideration in the future), which of > the following is the PRINCIPLE which decides whether or not a character > is suitable: > > (A) A proposed character will be rejected if its glyph is identical in > appearance to that of an extant glyph, regardless of its semantic > meaning, or > (B) A proposed character will be rejected if its semantic meaning is > identical to that of an extant character, regardless of the appearance > of its glyph, or > (C) A proposed character will be rejected if either (A) or (B) are true, or > (D) None of the above > ? > > Although this is a question about the future, no clairvoyance is > required, since I am asking about the principle behind decisions, not > about specific characters.
Response (D) unambiguously. There's no normative glyph in Unicode, which just specifies a single representative glyph just to exhibit the identity of the character and identify it between other encoded characters in the same script (or sometimes in other scripts as well, but there are many counter examples where even these representative glyphs for distinct characters will look the same). My opinion is that the main reason why a new "similar" character needs to be encoded is because its current normative properties can't fit with some linguistic usages or create false interpretation of text in some language. Or because the character glyph was borrowed from another script which globally behaves very differently (see the various symbols or letters that look like a Greek uppercase Lambda but have very distinct histories of use, and very different applications and properties, and would be used inconsistently if they were simply borrowed from a foreign script without gaining the new identity in the new script). If something can't be corrected by adding more glyphs substitution rules in fonts to render the text the way the authors want, or if any basic text handling produces wrong results because of a normative behavior (for example Bidi properties, case mappings, decompositions and canonical reordering of diacritics) then comes the need to add new characters. Look for example how various D with stroke, which look very similar or identical in uppercase, are given distinct codepoints: this is needed because they have very distinct lowercase mappings and because the lowercase versions should not be mixed as they have different identities. Another example is with some greek letters whose letterforms were borrowed into Latin but with distinct case mappings too: see the uppercase version of Latin Esh which looks very similar or identical to the Greek uppercase Sigma. Another example comes with the new mathematical symbols for which no case mappings are acceptable as lowercase and uppercase versions need to remain distinct symbols. We will probably soon see new characters added to Hebrew because of problems for the interpretation of Biblic texts, or simply because the currently used characters can't fit with any other symbol or letters borrowed from other scripts as they have the wrong character properties for usage in Hebrew. Unicode just needs to encode what is needed to preserve the identity of the encoded text without loosing parts of its semantics. Also Unicode will make efforts to ensure that a single script will be enough to represent the same language, at least at the lexical level (exceptions exist for example in Japanese which mixes several scripts in the same text: Hiragana, Katakana and Han, but I think that this does not affect the lexical level), so that a text in some language needs not to mix characters from all blocks. This simplifies the work as it reduces the number of code point blocks to support for a language (and I see it as a good reason why letters borrowed into a romanized text from other scripts such as Cyrillic and Greek, were added to the Latin block with separate code points). May be Unicode members have distinct views about it, but this seems what is needed to allow consistent handling of text in its encoded form, without reference to any graphical considerations such as glyph processing, positioning, or reordering, as this allows a renderer to use whatever font design that respects the character identity (see the extended differences of glyph styles which can exist in Latin or Arabic, for which a very rich and complex set of calligraphic designs have been created thoughout centuries and milleniums).

