Asmus Freytag wrote: > At 12:06 PM 7/16/02 +0700, Samphan Raruenrom wrote: >> There're some mistakes in Unicode char. >> properties for Thai char. and you have to "code around" that. > And the mistakes are?
I've discussed a few of them here in this list. I'll write a more formal report on the issue later. Here're some titles Problems from Unicode properties - error in combining class of vowel signs make normalization worthless in some cases. This is important if you want to compare strings. - decomposition of SARA AM add more problem to normalization - some properties make grapheme cluster for Thai imcompatible with the way Thai expect, e.g PINTHU as virama, SARA AM not a combining character Inaccuracy in the Unicode book - backspace 'always' use the same (grapheme cluster) character boundary as Del and left/right arrow. Actually Thai use backspace to delete single character not the whole cluster. So character boundary for backspace should be locale specific. - in Thai, zero width space is said to be able to expand in full-justified paragraph. Actually it is always zero width. These are things you have to khow after learning the Unicode standard if you plan to work with Thai language, to 'code around' the problem to make it acceptable for Thai people. I plan to write a formal report on the issue, not to change the standard, but to note what is wrong and what have to be code around. So people who like to work with Thai language (like you) will know the right thing to do and not repeat the same mistake as in some softwares. -- Samphan Raruenrom Information Research and Development Division, National Electronics and Computer Technology Center, Thailand. http://www.nectec.or.th/home/index.html

