Some comments below. Mark __________ http://www.macchiato.com ◄ “Eppur si muove” ►
----- Original Message ----- From: "Samphan Raruenrom" <[EMAIL PROTECTED]> To: "Asmus Freytag" <[EMAIL PROTECTED]> Cc: "Sreedhar M" <[EMAIL PROTECTED]>; <[EMAIL PROTECTED]>; "Rick McGowan" <[EMAIL PROTECTED]> Sent: Tuesday, July 16, 2002 07:22 Subject: Re: Is UniCode's Thai character representation is acceptable by TISI or not? > Asmus Freytag wrote: > > At 12:06 PM 7/16/02 +0700, Samphan Raruenrom wrote: > >> There're some mistakes in Unicode char. > >> properties for Thai char. and you have to "code around" that. > > And the mistakes are? > > I've discussed a few of them here in this list. I'll write > a more formal report on the issue later. Here're some titles > > Problems from Unicode properties > - error in combining class of vowel signs make normalization worthless > in some cases. This is important if you want to compare strings. Meaning: the normalized forms of two strings are not equal in cases where Thais would consider them equal, right? > - decomposition of SARA AM add more problem to normalization I don't recall seeing that note; I'll look forward to your report. > - some properties make grapheme cluster for Thai > imcompatible with the way Thai expect, e.g PINTHU as > virama, SARA AM not a combining character In the last UTC, action was taken that is not yet in the draft TR on boundaries. In particular, this affects Thai. > > Inaccuracy in the Unicode book > - backspace 'always' use the same (grapheme cluster) character boundary > as Del and left/right arrow. Actually Thai use backspace to delete single > character not the whole cluster. So character boundary for backspace > should be locale specific. This text will be overriden by the TR. > - in Thai, zero width space is said to be able to expand in full-justified > paragraph. Actually it is always zero width. There may be some misunderstanding here. What is meant is: if you had the sequence ABCD, and between the B and the C was a zero-width space, AND you were inter-character spacing for justification, you would not expect to see: A BC D Instead, you would expect to see A B C D That is, the zero-width space does not prevent the characters from using inter-character spacing. > > These are things you have to khow after learning the Unicode standard > if you plan to work with Thai language, to 'code around' the problem > to make it acceptable for Thai people. > I plan to write a formal report on the issue, not to change the standard, > but to note what is wrong and what have to be code around. So people > who like to work with Thai language (like you) will know the right thing > to do and not repeat the same mistake as in some softwares. > > -- > Samphan Raruenrom > Information Research and Development Division, > National Electronics and Computer Technology Center, Thailand. > http://www.nectec.or.th/home/index.html > > >

