REALLY not Tamil - changing scripts (long)

Addison Phillips [wM] Fri, 26 Jul 2002 20:56:08 -0700

I dunno, Curtis. This sounds less like a job for Unicode and more like a job for other 
mechanisms, such as user-defined locales.

Granted that keyboarding is a pain if you choose a character collection that is not 
represented by a convenient keyboard. But the real issues appear to be mostly in 
linguistically related processing (like word breaking, sentence breaking, collation, 
and the like). In most cases these are not something that Unicode per-se can help 
with, but which user-defined locale data could.

Let's take the putative Tongva @ letter as an example. If I had to create a locale in, 
say, Java for it, I could create special casing information (if @ has case), a 
collation table, breaking tables, and the like and nail most of the issues that you 
have. Even loading a "spell checker" is really a locale- or language-related problem 
in most systems today. The main problem would be if you were using @ but actually 
MEANT Ã» or some such. E.g. the Klingon problem, but with a real language.

When that's the case, then you have a case for encoding a new character. But the 
escaping mechanisms in Unicode, like SpecialCasing, seem ample enough to handle 
minority languages like these in all cases where you are just creating an orthography 
using an existing writing system's bits and pieces. It's not like Unicode has defined, 
say, "vowelness" or pronunciation.

IOW> If you have a new character that needs encoding, then the UTC can probably be 
cadged into encoding it. If you are using existing encoded characters from another 
writing system, then there is nothing to do >>in Unicode<< except note the exceptional 
use of those characters.

That does leave you with the must less happy problem of finding a platform with user 
defined locales (approximately no platforms conveniently do this).

Obviously I'm not an expert in these linguistic areas (and hence rarely comment on 
them), but it seems to me that the lack of other mechanisms makes Unicode an 
attractive target for criticism in this area.

Best Regards,

Addison

Addison P. Phillips
Director, Globalization Architecture
webMethods, Inc.
432 Lakeside Drive
Sunnyvale, California, USA
+1 408.962.5487 (phone)  
+1 408.210.3569 (mobile)
-------------------------------------------------
Internationalization is an architecture.
It is not a feature. 

> -----Original Message-----
> From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]]On
> Behalf Of Curtis Clark
> Sent: Friday, July 26, 2002 6:46 PM
> To: [EMAIL PROTECTED]
> Subject: *not* Tamil - changing scripts (long)
> 
> 
> James Kass wrote:
> > Isn't this kind of a Catch-22 for anyone contemplating script reform?
> > Do we discourage people from altering their own scripts?  Should we?
> > It is suggested that scripts can be "alive" in the same sense that
> > languages are "alive"; changes (which are part of life) just occur
> > much more slowly in scripts.
> 
> This touches on some "Unicode vs. the world" issues I've been thinking 
> about, having to do with indigenous peoples developing orthographies for 
> their own languages.
> 
> My two examples are both languages of the Takic group in southern 
> California. The LuiseÃ±o language declined to a very few native speakers, 
> but has enjoyed a renaissance in recent years. The Gabrieleno (Tongva) 
> language was effectively extinctâ€”no native speakers, no recordings, some 
> amount of written documentationâ€”but the Tongva are resurrecting it (it 
> is similar enough to the other Takic languages that it is possible to 
> reconstruct parts that are missing).
> 
> Anthropological accounts of both languages are of course in the phonetic 
> alphabets beloved by linguists in the days before IPA stabilization. 
> And, like many other native Americans, the LuiseÃ±o and Tongva have 
> wanted simpler orthographies that can be typed with US-English keyboards.
> 
> I don't have a lot of familiarity with LuiseÃ±o, but web pages have 
> included passages where non-letters (such as @) are used as letters. 
> This solves the keyboarding problem (since few people would try to 
> pronounce an email address as LuiseÃ±o), but I imagine all sorts of 
> issues with sorthing, searching, word selection, casing, and all the 
> other sorts of things that computers can do for "major" languages.
> 
> Where all this involves me is with Tongva. I have been working with a 
> Tongva ethnobotanist on a project that, among other things, involves 
> plant labels in Tongva, English, and Latin. Tongva spelling is currently 
> inconsistent, and my colleague has been regularizing it for this project 
> (because he is the primary language teacher for the nation, and few have 
> any fluency at all, he has this freedom). Somewhat like English, Tongva 
> represents both the "oo" and "uh"  sounds both by "u". Unlike English, 
> the rest of the orthography provides no clues to which sound is meant.
> 
> /If/ my colleague were to ask (and the Tongva may be satisfied with the 
> existing orthography), I would suggest representing the "uh" sound with 
> a Latin-1 letter (possibly Ã»), and explain several simple alternatives 
> for keyboarding it on Mac and Windows. I would *not* suggest overloading 
> @, or some similar approach.
> 
> I suppose that Unicode could add at some point "LuiseÃ±o letter @", with 
> appropriate properties, but that would circumvent the reason for picking 
> it: its presence in US-ASCII. In an ideal world, indigenous peoples 
> would hook up with folks like Michael Everson (or even me) and get some 
> guidance on how to have their orthography and eat it, too, but as things 
> now stand, overloading, font hacks, and the like are the path of least 
> resistance.
> 
> -- 
> Curtis Clark                  http://www.csupomona.edu/~jcclark/
> Mockingbird Font Works                  http://www.mockfont.com/
> 
> 
> 
>

REALLY *not* Tamil - changing scripts (long)

Reply via email to

REALLY not Tamil - changing scripts (long)