Gautam--
[Gautam]: Well, too bad. I guess we still have an obligation to explore the extent of sub-optimal solutions that are being imposed upon South-Asian scripts for the sake of *backward compatibility* or simply because they are "fait accomplis". (See Peter Kirk's posting on this issue). However, I am by no means suggesting that the fault lies with the Unicode Consortium.
I'm a little
confused by this statement. What would be the difference between
sticking with a suboptimal solution because it's a fait accompli and sticking
with it out of the need for backward compatibility? The need for
backward compatibility exists because the suboptimal solution is a fait
accompli. Or are you stating that backward compatibility is a specious
argument because the encoding is so broken nobody's actually using
it?
[Gautam]: This is again the "fait accompli" argument. We need to *know* whether adopting an alternative model WOULD HAVE BEEN PREFERABLE, even if the option to do so is no longer available to us.
I
don't understand. If the option to go to an alternative model is not
available, why is it important to know that the alternative model would have
been preferable?
[Gautam]: I think there is a slight misunderstanding here. The ZWJ I am proposing is script-specific (each script would have its own), call it "ZWJ PRIME" or even "JWZ" (in order to avoid confusion with ZWJ). It doesn't exist yet and hence has no semantics.
Okay. Maybe I'm dense, but this wasn't clear to me from your other
emails. You're not proposing that U+200D be used to join Indic consonants
together; you're basically arguing for virama-like functionality that goes far
enough beyond what the virama does that you're not comfortable calling it a
virama anymore.
JWZ is a piece of formalism. Its meaning would be precisely what we chose to assign to it. It behaves like the existing (script-specific) VIRAMA's except that it also occurs between a consonant and an independent vowel, forcing the latter to show up in its combining form.
Aha! This is what I wasn't parsing out of your previous
emails. It was there, but I somehow didn't grok it. To
summarize:
Tibetan deals with consonant clusters by encoding each of the consonants
twice: One series of codes is to be used for the first consonant in a cluster,
and the other series is to be used for the others. The Indian scripts
don't do this; they use a single series of codes for the consonants and cause
consonants to form clusters by adding a VIRAMA code between them. But the
Indian scripts still have two series of VOWELS more or less analogous to the two
series of consonants in Tibetan. When you want a non-joining vowel, you
use one series, and when you want a joining vowel, you use the
other.
You want to have one series of vowels and extend the virama model to
conbining vowels. Thus, you'd represent KI as KA + VIRAMA + I; KA + I
would represent two syllables: KA-I. Since a real virama never does this,
you're using a different term ("JWZ" in your most recent message) for the
character that causes the joining to happen. You're not proposing any
difference in how consonants are treated, other than having this new character
server the sticking-together function that the VIRAMA now serves and changing
the existing VIRAMA to always display explicitly.
Now do I understand you? Sorry for my earlier
misunderstandings.
Now that we have freed up all those code points occupied by the combining forms of vowels by introducing the VIRAMA with extended function, let us introduce an explicit (always visible) VIRAMA. That's all.
As far as Unicode is concerned, you can't "free up" any code
points. Once a code point is assigned, it's always assigned. You can
deprecate code points, but that doesn't free them up to be reused; it only (with
luck) keeps people from continuing to use them.
It seems to me that a system could support the usage you want and the old
usage at the same time. I could be wrong, but I'm guessing that KA +
VIRAMA + I isn't a sequence that makes any sense with current implementations
and isn't being used. It would be possible to extend the meaning of the
current VIRAMA to turn the independent vowels into dependent vowels.
Future use of the dependent-vowel code points could be discouraged in favor of
VIRAMA plus the independent-vowel code points. Old documents would
continue to work, but new documents could use the model you're after. (You
get the explicit virama the same way you do now: VIRAMA + ZWNJ.) This
solution would involve encoding no new characters and no removal of existing
characters, but just a change in the semantics of the
VIRAMA.
That said, I'm not sure this is a good idea. If what you're really
concerned about is typing and editing of text, you can have that work the way
you want without changing the underlying encoding model. It involves
somewhat more complicated keyboard handling, but I'm pretty sure all the major
operating systems allow this. The basic idea is that you have one set of
vowel keys that normally generate the independent-vowel code points, but if one
of them is preceded by the VIRAMA key, the two keystrokes map to a single
character: the dependent-vowel code point. This is a simple solution that
can be implemented today with very little fuss and involves no changes to
Unicode or to the various fonts and rendering engines that would be required of
the VIRAMA code point took on a new meaning. From a user's point of view,
things work the way they're supposed to, and they work that way sooner than if
Unicode is changed. Only programmers have to worry about the actual
encoding details, and unless keeping the existing model makes THEIR jobs
significantly harder, the encoding itself shouldn't change.
I hope this makes sense...
--Rich Gillam
Language Analysis Systems, Inc.

