Gautam-- >Take a second look. My suggestion amounts to: > >1. retaining the script-specific virama as it is. Its >existing behavior remains unchanged. I rename it as >"(script-specific) ZWJ" merely for my convenience and conceptual clarity. > >2. extending the role of this script-specific ZWJ to >encode combining forms of vowels in CV sequences, >entirely in line with the way it is used to encode CC ligatures. > >[1 and 2 may sound somewhat different from what I have suggested above, but they are in effect the same]. > >3. introducing a script-specific explicit virama, >which we can very well afford after getting rid of all >the combining forms of vowels. > >4. getting rid of *all* precomposed forms including >the recent innovations in Devanagari that are used >only for transliteration. These not only fill up the >code space of Devanagari but also put constraints on >the placement of characters in the code spaces of >other Indian scripts. > >How much recoding would these changes involve? Would >the cost be really unacceptable?
Yes, the cost is really unacceptable. Two of the most basic Unicode stability policies dictate that character assignments, once made, are never removed and character names can never change. Step 4 cannot happen; the best that can happen is that the code points in question can be deprecated. The renaming you suggest in 1 cannot happen either. The change in the encoding model for the virama can't happen either; there are too many implementations based on it, and there are too many documents out there that use the current encoding model. Your suggestion wouldn't make them unreadable when opened with software that did things the way you're suggesting, but it would change their appearance in ways that are unlikely to be acceptable. [I preface what follows with the observation that I'm not by any stretch of the imagination an expert on Indic scripts, but I do fancy myself an expert on Unicode.] I'm also pretty sure that using ZWJ as a virama won't work and isn't intended to work. KA + ZWJ + KA means something totally different from KA + VIRAMA + KA, and I, for one, wouldn't expect them to be drawn the same. U+0915 represents the letter KA with its inherent vowel sound; that is, it represents the whole syllable KA. Two instances of U+0915 in a row would thus represent "KAKA", completely irrespective of how they're drawn. Introducing a ZWJ in the middle would allow the two SYLLABLES to ligate, but there's no ligature that represents "KAKA", so you should get the same appearance as you do without the ZWJ. The virama, on the other hand, cancels the vowel sound on the KA, turning it into K: The sequence KA + VIRAMA + KA represents the syllable KKA, again irrespective of how it is drawn. In other words, ZWJ is intended to change the APPEARANCE of a piece of text without changing its MEANING (there are exceptions in the Arabic script, but this is the general rule). Having KA + ZWJ + KA render as the syllable KKA would break this rule: the ZWJ would be changing the MEANING of the text. Whether the syllable KKA gets drawn with a virama, a half-form, or a ligature is the proper province of ZWJ and ZWNJ, and this is what they're documented in TUS to do. But ZWJ can't (and shouldn't) be used to turn KAKA into KKA. Maybe it was unfortunate to call U+094D a "virama," since it doesn't necessarily get drawn as a virama (or, indeed, as anything), but it's too late to revisit that decision. For that matter, it may have been a mistake to use the virama model to encode conjunct forms in Bengali, but it's too late to change that now. Real users generally shouldn't have to care, though; this is an issue for programmers and font designers. Their lives may be harder than they should have been, but unless it's horribly hard for them to produce the right effects for their users, it isn't worth it to reopen the issue of Unicode encoding of Indic scripts, especially the ones that have been in Unicode for more than a decade now. There are lots of things that suck about Unicode, but on the whole, it's way better than what came before and solves more problems that it creates. Backward compatibility is a pain in the butt, and it forces us to live with a lot of mistakes and suboptimal solutions we wish we didn't have to live with. But backward compatibility is also good-- it means the solution was good enough in the first place that people are using it. --Rich Gillam Language Analysis Systems, Inc. "Unicode Demystified"

