Some comments below.

Mark
__________
http://www.macchiato.com
◄  “Eppur si muove” ►

----- Original Message -----
From: "Samphan Raruenrom" <[EMAIL PROTECTED]>
To: "Asmus Freytag" <[EMAIL PROTECTED]>
Cc: "Sreedhar M" <[EMAIL PROTECTED]>; <[EMAIL PROTECTED]>; "Rick
McGowan" <[EMAIL PROTECTED]>
Sent: Tuesday, July 16, 2002 07:22
Subject: Re: Is UniCode's Thai character representation is acceptable
by TISI or not?


> Asmus Freytag wrote:
> > At 12:06 PM 7/16/02 +0700, Samphan Raruenrom wrote:
> >> There're some mistakes in Unicode char.
> >> properties for Thai char. and you have to "code around" that.
> > And the mistakes are?
>
> I've discussed a few of them here in this list. I'll write
> a more formal report on the issue later. Here're some titles
>
> Problems from Unicode properties
> - error in combining class of vowel signs make normalization
worthless
>    in some cases. This is important if you want to compare strings.

Meaning: the normalized forms of two strings are not equal in cases
where Thais would consider them equal, right?

> - decomposition of SARA AM add more problem to normalization

I don't recall seeing that note; I'll look forward to your report.

> - some properties make grapheme cluster for Thai
>    imcompatible with the way Thai expect, e.g PINTHU as
>    virama, SARA AM not a combining character

In the last UTC, action was taken that is not yet in the draft TR on
boundaries. In particular, this affects Thai.

>
> Inaccuracy in the Unicode book
> - backspace 'always' use the same (grapheme cluster) character
boundary
>    as Del and left/right arrow. Actually Thai use backspace to
delete single
>    character not the whole cluster. So character boundary for
backspace
>    should be locale specific.

This text will be overriden by the TR.

> - in Thai, zero width space is said to be able to expand in
full-justified
>    paragraph. Actually it is always zero width.

There may be some misunderstanding here. What is meant is: if you had
the sequence ABCD, and between the B and the C was a zero-width space,
AND you were inter-character spacing for justification, you would not
expect to see:

A      BC      D

Instead, you would expect to see

A    B    C    D

That is, the zero-width space does not prevent the characters from
using inter-character spacing.

>
> These are things you have to khow after learning the Unicode
standard
> if you plan to work with Thai language, to 'code around' the problem
> to make it acceptable for Thai people.
> I plan to write a formal report on the issue, not to change the
standard,
> but to note what is wrong and what have to be code around. So people
> who like to work with Thai language (like you) will know the right
thing
> to do and not repeat the same mistake as in some softwares.
>
> --
> Samphan Raruenrom
> Information Research and Development Division,
> National Electronics and Computer Technology Center, Thailand.
> http://www.nectec.or.th/home/index.html
>
>
>


Reply via email to