On 08/11/2004 20:06, Edward H. Trager wrote:
...
While the Unicode code space is by definition mathematically finite, still it is
for all practical intents and purposes a very large code space that should be
able to incorporate the "legitimate needs" of scholars, researchers, historians,
among others. Regardless of whether one agrees completely or not about the encoding
of Phoenecian in Unicode, I --perhaps naively I admit-- fail to see how it does
any more harm than the encoding of that HUGE number of "CJK Unified Ideographs Extension B" which, as far as I can tell (given my lack of scholarship in this area), is of more use to esoteric scholars
than it is to ordinary speakers and writers of Chinese, Japanese, or Korean.
It is no worse than the encoding of a large number of Arabic ligatures --a clear
case of encoding glyphs, not characters-- that occurred in Unicode to support legacy
systems that had already been defined for Arabic at the time when Unicode came around.
Thankfully a similar thing did not happen for, say, Syriac. It is no worse than
the encoding of Hangul syllables.
I don't closely follow what additional planes of Unicode are being designated for, but perhaps there should be a plane set aside for the encoding of historical
"script nodes" that would be useful to scholars, but not as useful to others. Then again, perhaps I'm too naive in this area to know what I'm talking about ... ;-)
Thank you for your mostly helpful comments.
But I would like to address your argument that it does no harm to add additional characters which people can use or not use as they please. I would like to disagree, as a general principle. The aim of Unicode standardisation is surely to define a single and unambiguous representation of text. That requires that there be a single code point for each character, or perhaps a set of canonically equivalent representations. Where for historical reasons there are alternative representations e.g. Arabic presentation forms, use of them is clearly (though sometimes not clearly enough) deprecated, and anyway they usually have canonical decompositions. But if we get into the position where there is more than one (not canonically equivalent) way of representing the same text, we are moving quickly away from standardisation. There may be good reasons for some departures, but the impact of these will be minimised by mechanisms like compatibility decompositions and folding together for collation. But the suggestion of encoding alternative representations for variant forms of scripts for use alongside the original ones is likely to lead rapidly to chaos.
Imagine for example if Fraktur were defined as a "historical script node" on your scheme, for use by scholars only. The result would be that some scholars would encode texts with the special Fraktur characters, but others as well as the general public would encode them as currently as glyph variants of Latin script. The result would quickly be chaos.
... (omitted by request)
-- Peter Kirk [EMAIL PROTECTED] (personal) [EMAIL PROTECTED] (work) http://www.qaya.org/

