--- Marco Cimarosti <[EMAIL PROTECTED]> wrote: > Gautam Sengupta wrote: > > I am no programmer, but surely the rendering > engine > > could be tweaked to display a halant/hashant in > the > > aforementioned situations? I understand that it > won't > > happen *automatically* if we were to use <ZWJ> > instead > > of <VIRAMA>. But if you were to take the trouble > to do > > the tweaking, you'd then have a completely > *intuitive* > > encodings for vowel yaphala sequences, > > <vowel><ZWJ><Y>, instead of oddities like > > <vowel><VIRAMA><Y>. > > OK but, then, your <ZWJ> becomes exactly what > Unicode's <VIRAMA> has always > been: a character that is normally invisible, > because it merges in a > ligature with adjacent characters, but occasionally > becomes visible when a > font does not have a glyph for that combination.
You are absolutely right. I am suggesting that the language-specific viramas be retained as script-specific *explicit* viramas that never disappear. In addition, let's have a script-specific ZWJ which behaves in the way you describe in the preceding paragraph. The explicit virama (rather the ONLY virama) will never appear after a vowel, but the language-specific ZWJ will, as in <A><ZWJ><Y><AA> encoding A+YOPHOLA+AA. The cost is just one additional code point for each script. Note that we will no longer need the combining vowels or an additional code point for YAPHOLA. > > But there is one detail which makes your approach > much more complicated: > what we have been calling <VIRAMA> is *not* a single > character. Every Indic > script has its own: <DEVANAGARI SIGN VIRAMA>, > <BENGALI SIGN VIRAMA>, and so > on. > > Each one of these characters, when displayed > visibly, has a distinct glyph: > a Bangla hashant is a small "/" under the letter, a > Tamil virama is a dot > over the letter, etc. > > With your approach, the single character <ZWJ> is > overloaded with a dozen > different glyphs depending on which script the > adjacent letters belong to. > Plus, it still has to be invisible when used in a > non-Indic script, such as > Arabic. > > Implementing all this is certainly possible, but > would result in bigger > look-up tables, for no advantage at all. See my previous paragraph. > > Perhaps there isn't a *problem* as such, and > perhaps > > naturalness and intuitive acceptability aren't > *key* > > features of the system, but surely other factors > being > > equal they ought be taken into consideration in > > choosing one method of encoding over another? > > Yes. But the flaws that I see in ISCII/Unicode model > are much smaller than you imply. E.g., I agree that > it would have been more logic if: > > - independent and dependent vowels were the same > characters; > > - each script was encoded in its natural > alphabetical order; > > - there were no precomposed and decomposed > alternatives for the same > graphemes. > > > And others, on which perhaps a linguist won't agree, > but which would have > made life much easier to programmers: > > - all vowels were encoded in visual order, so that > vowel reordering was necessary; > > - "repha ra" were encoded as a separate characters, > so that no reordering at all was necessary. I agree with you on all of these issues. You have in fact summed up my critique of the ISCII/Unicode model. The only point I'd like to add here is that these mistakes were avoidable and should have been avoided. There can be no excuses for placing the Assamese r and v the way they are currently placed. The same goes for the long syllabic R and L. > But, all summed up, leaving with these little flaws > is *much* simpler than > trying to change the rules of a standard a dozen > years after people started > implementing it. Take a second look. My suggestion amounts to: 1. retaining the script-specific virama as it is. Its existing behavior remains unchanged. I rename it as "(script-specific) ZWJ" merely for my convenience and conceptual clarity. 2. extending the role of this script-specific ZWJ to encode combining forms of vowels in CV sequences, entirely in line with the way it is used to encode CC ligatures. [1 and 2 may sound somewhat different from what I have suggested above, but they are in effect the same]. 3. introducing a script-specific explicit virama, which we can very well afford after getting rid of all the combining forms of vowels. 4. getting rid of *all* precomposed forms including the recent innovations in Devanagari that are used only for transliteration. These not only fill up the code space of Devanagari but also put constraints on the placement of characters in the code spaces of other Indian scripts. How much recoding would these changes involve? Would the cost be really unacceptable? Best, Gautam __________________________________ Do you Yahoo!? The New Yahoo! Shopping - with improved product search http://shopping.yahoo.com

