mixed-script writing systems

Peter_Constable Fri, 15 Nov 2002 10:16:20 -0800

One of the Unicode design principles is unification: "unify across
languages, but not across scripts". As a result, the "A" used in all
Latin-based writing systems is the same character, but that character is
different from the "A" used in Cyrillic- or Greek-based writing systems.


There are a very small number of cases of truly ecclectic writing systems:
the IPA transcription system uses mostly Latin characters, but also uses a
few Greek characters, and Japanese writing mixes three scripts (complete
repertoires of two scripts, and a large portion of a third script).  (One
might debate whether we should describe Japanese writing in terms of a
single writing system involving three scripts, or simultaneous use of three
writing systems. I have been inclined toward the former, but that's another
topic.) Of course, digits and punctuation get shared, but the norm is that
a writing system for a given language is based on a single script, and IPA
and Japanese are clearly exceptions.

That intro may well spawn a number of sub-threads, but I'm interested in
only one question. It has to do with an Asian language, Wakhi
(http://www.ethnologue.com/show_language.asp?code=WBL). This is spoken in
Afghanistan, China, Pakistan and Tajikistan (reportedly, similar
populations in each country). I don't know if the same writing system is
used in all countries, but there is at least one writing system, which is
Latin-based. (There appears also to be a distinct Cyrillic-based writing
system in use.)

What is unusual about this Latin-based writing system is that its creators
(I don't know who they were) were a little bit ecclectic: whereas most of
the characters are from the Latin script, it also uses three Greek
characters and one Cyrillic character: gamma, delta, theta, and Cyrillic
yeru (U+042B, U+044B). I've attached a GIF showing a sample of a page from
a publication showing all four of these characters (though not both upper
and lower case; note that the gamma is also used with combining caron to
create another grapheme).

(The gamma is designed like the Greek gamma, U+0393 / U+03B3, and not the
Latin gamma, U+0194 / U+0263. Also, it uses an ezh, which could possible be
represented as the Cyrillic characters "Abkhasian Dze / dze" U+04E0 /
U+04E1, but given that the vast majority of characters are Latin, is makes
mroe sense to consider these to be the Latin characters Ezh / ezh, U+01B7 /
U+0292.)

So, the question is this: Should we say that this writing system is
completely Latin (keeping the norm that orthographic writing systems use a
single script) and apply the principle of unification -- across languages
but not across scripts -- to imply that we need to encode new characters,
Latin delta, Latin theta and Latin yeru? Or, do we say that this writing
system is only *mostly* Latin-based, and that it mixes in a few characters
from other scripts?

I have an idea what I think is the better thing to do, but I'm curious to
see if it matches others' opinions.



- Peter


---------------------------------------------------------------------------
Peter Constable

Non-Roman Script Initiative, SIL International
7500 W. Camp Wisdom Rd., Dallas, TX 75236, USA
Tel: +1 972 708 7485
E-mail: <[EMAIL PROTECTED]>


(See attached file: Luqo Injil_38.gif)

<<attachment: Luqo Injil_38.gif>>

mixed-script writing systems

Reply via email to