The "SARA AM" problem seems to be with the compatibility decomposition (NFKD and NFKC). NFK* change a lot of characters and strings - not just Thai - in various visible and functional ways and must be used with caution.
markus Samphan Raruenrom wrote: > Mark Davis wrote: > >>- decomposition of SARA AM add more problem to normalization > > I don't recall seeing that note; I'll look forward to your report. > > Please see my discussion with khun Peter Constable quoted below. --- 8< --- > 2) > > 0E32;THAI CHARACTER SARA AA;Lo;0 > 0E48;THAI CHARACTER MAI EK;Mn;107 > 0E33;THAI CHARACTER SARA AM;Lo;0;L;<compat> "NIKHAHIT" "SARA AA" > > There're two ways to represent the word KO KAI + MAI EK + SARA AM > > (a) KO KAI + MAI EK + SARA AM > (b) KO KAI + NIKHAHIT + MAI EK + SARA AA > > (b) must be in this sequence to get the intended look for > the word (not that this is the valid sequence for Thai/WTT). > That is the mai-ek is on top of the nikhahit. > > The problem is with the NFKD/NFKC of (a), which is > > (c) KO KAI + MAI EK + NIKHAIT + SARA AA > > Which will be rendered with nikhahit on top of mai-ek. > Which is not the same as (a), and is not the intened look. > So this means that the string change its shape after > normalization. Is this a violation of any principle? > > The problem comes also from the fact that combining class of > NIKHAHIT is 0 and that make reording of (c) impossible.

