In Unicode U+0BBE, U+0BC6 and U+0BCA are all dependent vowel signs IE is probably treating a base character and any dependent vowels as a single unit. Since in some fonts a base character + combining vowel mark might be displayed by a single ligature glyph, it makes sense to apply the formatting of a base character to any dependant combining characters as well.
In Mozilla you may be completely breaking the font lookups by separately formatting the different parts of a conjunct. In legacy glyph based Tamil encodings there was a simple one-to-one correspondence characters and glyphs so it is straightforward to apply different formatting to different characters. -- Christopher J. Fynn ----- Original Message ----- From: "Peter Jacobi" <[EMAIL PROTECTED]> To: <[EMAIL PROTECTED]> Sent: Saturday, December 06, 2003 6:39 PM Subject: Transcoding Tamil in the presence of markup > Dear All, > > I am attempting transcoding Tamil text (in legacy 8-bit encodings, which > are in visual glyph order, being heirs of the Tamil typewriter) into Unicode > (which uses 'logical' order invented by ISCII): > http://www.jodelpeter.de/i18n/tamil/xref-uc.htm > > When I thought, my converter was ready, I had a severe collision > with reality, as I tried it on some webpages. > > The problem: in the legacy encoding you can style individual characters, > which not only breaks my simple converter, but which may have no > good equivalent in Unicode anyway. See this example: > (all legacy encoded Tamil is shown using C-style escape, Unicode Tamil as > NCR) > > Converting unstyled text > from TSCII > lA \xC4\xA1 > le \xA7\xC4 > lo \xA7\xC4\xA1 > to Unicode > lA லா > le லெ > lo லொ > > Now the consonant l should get a distinct color: > In TSCII: > lA <span style='color:#00f'>\xC4</span>\xA1 > le \xA7<span style='color:#00f'>\xC4</span> > lo \xA7<span style='color:#00f'>\xC4</span>\xA1 > > In Unicode: > lA <span style='color:#00f'>ல</span>ா > le <span style='color:#00f'>ல</span>ெ > lo <span style='color:#00f'>ல</span>ொ > > It is easy to see, that simple n:m mapping cannot make this conversion. > It is not that easy to judge whether this is the desired conversion at all. > And what should the receiving software should do with it. > Some tests: In Mozilla 1.4.1 the characters fall apart and in IE5.5 the > style expands to the entire orthographic syllable. > Unicode test page: http://www.jodelpeter.de/i18n/tamil/markup-uc.htm > TSCII test page: http://www.jodelpeter.de/i18n/tamil/markup-tscii.htm > > After seeing this effect at its source, it's now clear why you can't style > individual > Tamil characters in a word processor, when using Unicode (whereas > you can do so, in legacy encodings). > > It's hard to promote Unicode, when things that have worked in the past, > stop working. > > Any insights? > > Regards, > Peter Jacobi > > > > > -- > +++ GMX - die erste Adresse f�r Mail, Message, More +++ > Neu: Preissenkung f�r MMS und FreeMMS! http://www.gmx.net > > >

