Re: Is UniCode's Thai character representation is acceptable by TISI or not?

Markus Scherer Wed, 17 Jul 2002 18:08:55 -0700

The "SARA AM" problem seems to be with the compatibility decomposition (NFKD and NFKC).
NFK* change a lot of characters and strings - not just Thai - in various visible and 
functional ways and must be used with caution.


markus

Samphan Raruenrom wrote:

> Mark Davis wrote:
>  >>- decomposition of SARA AM add more problem to normalization
>  > I don't recall seeing that note; I'll look forward to your report.
> 
> Please see my discussion with khun Peter Constable quoted below.


--- 8< ---


> 2)
> 
> 0E32;THAI CHARACTER SARA AA;Lo;0
> 0E48;THAI CHARACTER MAI EK;Mn;107
> 0E33;THAI CHARACTER SARA AM;Lo;0;L;<compat> "NIKHAHIT" "SARA AA"
> 
> There're two ways to represent the word KO KAI + MAI EK + SARA AM
> 
> (a) KO KAI + MAI EK + SARA AM
> (b) KO KAI + NIKHAHIT + MAI EK + SARA AA
> 
> (b) must be in this sequence to get the intended look for
> the word (not that this is the valid sequence for Thai/WTT).
> That is the mai-ek is on top of the nikhahit.
> 
> The problem is with the NFKD/NFKC of (a), which is
> 
> (c) KO KAI + MAI EK + NIKHAIT + SARA AA
> 
> Which will be rendered with nikhahit on top of mai-ek.
> Which is not the same as (a), and is not the intened look.
> So this means that the string change its shape after
> normalization. Is this a violation of any principle?
> 
> The problem comes also from the fact that combining class of
> NIKHAHIT is 0 and that make reording of (c) impossible.

Re: Is UniCode's Thai character representation is acceptable by TISI or not?

Reply via email to