One of the Unicode design principles is unification: "unify across languages, but not across scripts". As a result, the "A" used in all Latin-based writing systems is the same character, but that character is different from the "A" used in Cyrillic- or Greek-based writing systems.
There are a very small number of cases of truly ecclectic writing systems: the IPA transcription system uses mostly Latin characters, but also uses a few Greek characters, and Japanese writing mixes three scripts (complete repertoires of two scripts, and a large portion of a third script). (One might debate whether we should describe Japanese writing in terms of a single writing system involving three scripts, or simultaneous use of three writing systems. I have been inclined toward the former, but that's another topic.) Of course, digits and punctuation get shared, but the norm is that a writing system for a given language is based on a single script, and IPA and Japanese are clearly exceptions. That intro may well spawn a number of sub-threads, but I'm interested in only one question. It has to do with an Asian language, Wakhi (http://www.ethnologue.com/show_language.asp?code=WBL). This is spoken in Afghanistan, China, Pakistan and Tajikistan (reportedly, similar populations in each country). I don't know if the same writing system is used in all countries, but there is at least one writing system, which is Latin-based. (There appears also to be a distinct Cyrillic-based writing system in use.) What is unusual about this Latin-based writing system is that its creators (I don't know who they were) were a little bit ecclectic: whereas most of the characters are from the Latin script, it also uses three Greek characters and one Cyrillic character: gamma, delta, theta, and Cyrillic yeru (U+042B, U+044B). I've attached a GIF showing a sample of a page from a publication showing all four of these characters (though not both upper and lower case; note that the gamma is also used with combining caron to create another grapheme). (The gamma is designed like the Greek gamma, U+0393 / U+03B3, and not the Latin gamma, U+0194 / U+0263. Also, it uses an ezh, which could possible be represented as the Cyrillic characters "Abkhasian Dze / dze" U+04E0 / U+04E1, but given that the vast majority of characters are Latin, is makes mroe sense to consider these to be the Latin characters Ezh / ezh, U+01B7 / U+0292.) So, the question is this: Should we say that this writing system is completely Latin (keeping the norm that orthographic writing systems use a single script) and apply the principle of unification -- across languages but not across scripts -- to imply that we need to encode new characters, Latin delta, Latin theta and Latin yeru? Or, do we say that this writing system is only *mostly* Latin-based, and that it mixes in a few characters from other scripts? I have an idea what I think is the better thing to do, but I'm curious to see if it matches others' opinions. - Peter --------------------------------------------------------------------------- Peter Constable Non-Roman Script Initiative, SIL International 7500 W. Camp Wisdom Rd., Dallas, TX 75236, USA Tel: +1 972 708 7485 E-mail: <[EMAIL PROTECTED]> (See attached file: Luqo Injil_38.gif)
<<attachment: Luqo Injil_38.gif>>