John Hudson wrote:

....

I have been thinking today that part of the reason for the debate is that Unicode has a singular concept of 'script', a bucket into which variously shaped concepts of writing systems must be put or rejected. I don't think there is anything conceptually wrong with the idea that specific instances of a single script might be separately encoded if there is a need or desire to distinguish them in plain text. It just happens that Unicode has only one word that can be applied to such instances, and that is 'script'. It seems clear to me now that what Unicode calls a script needn't necessarily be what semiticists, or anyone else, calls a script. A functional Unicode definition of script might be formed as: a finite collection of characters that can be distinguished in plain text from other collections of characters.



John

"Script" is already defined in ISO 10646 as:

<<4.35 script: A set of graphic characters used for the written form of one or more languages.>>

and  "graphic character"  is defined as :

<< 4.20 graphic character: A character, other than a control function, that has a visual representation normally handwritten, printed, or displayed.>>

So I guess if any further definition of "script" is necessary it should be based on this.

Further the (draft?) ISO 15924 standard uses the same definition

<< 3.7 script A set of graphic characters used for the written
form of one or more languages.(ISO/IEC 10646-
1)(fr 3.6 Ãcriture )>>

but adds an extra note:

<< NOTE 1:A script,as opposed to an arbitrary subset of
characters,is defined in distinction to other scripts;in
general,readers of one script may be unable to read the
glyphs of another script easily,even where there is a
historic relation between them (see 3.9).>>

[ 3.9 script variant
A particular form of one script which is so
distinctive a rendering as to almost be considered
a unique script in itself.(fr 3.9 variante d âÃcriture )]

With regard to historic & archaic scripts TUS itself states
"The overall capacity for more than a million characters is more than sufficient for all known character encoding requirements, including full coverage of all minority and historic scripts of the world. " (1.0 )


and

"As the universal character encoding scheme, the Unicode Standard must also respond to scholarly needs. To preserve world cultural heritage, important archaic scripts are encoded as proposals are developed." (1.1.2)

So there is a clear statement of purpose to give full coverage to *all* minority and historic scripts and to encode "important" archaic scripts.

In 1.2 "Design Goals" TUS states:
"The primary goal of the development effort for the Unicode Standard was to remedy two serious problems common to most multilingual computer programs. The first problem was the overloading of the font mechanism when encoding characters."


Telling people who propose a script that they can "just use a different font " could very easily contradict this stated goal.

There are very real issues of software implementation, font development, collation, text indexing and searching, etc. that arise from encoding multiple instances of what some users consider a single script, whether users in general opt to make the distinction in plain text or not, by using the separate character collections or unifying text in a single character collection and making the distinction at a higher level. I'm beginning to think that our time would be better spent thinking about those issues.

These are of course real issues - particularly collation, text indexing, searching and - where a written language occurs in several scripts - the ability to display text encoded in one script with glyphs of another. Establishing standard, straightforward and widely supported means to deal with these issues is a worthy goal. In many cases the solutions for these problems is in fact already specified or pretty clear - and, relatively speaking , these are reasonably straightforward to implement.

Thier absecence - or lack of support - should not be a reason to reject a script proposal on the grounds that "it will cause difficulties" - this is sort of kind of argument put forward by PR China when they submitted their proposal for a host of precomposed Tibetan characters. When Indic scripts were first encoded a whole software infrastructure and font/rendering technologies which were not then available in common desktop operating systems was assumed - and it has taken a decade for this encoding to be anything like widely supported on a practical level. The solutions for these problems already specified or pretty clear - and, relatively speaking , reasonably straightforward to implement.

IMO, in the long term, encoding of archaic scripts is going to benefit the whole scholarly community. When children discover all kinds of scripts on their computers they are going to become curious and play with them and some of them will be inspired to go out and find out more about these scripts. Some of these will develop a serious interest and a few will end up being the Palaeographers, Semiticists, Sanskritists and so on of tomorrow.


- Chris




Reply via email to