On 27/10/2003 18:06, Philippe Verdy wrote:

From: "Peter Kirk" <[EMAIL PROTECTED]>



Thanks for the clarification. In principle we might be able to go a little further: we could define both <c, CCO> and <CCO, c> as canonically equivalent to c for all c in combining class zero. This would have to be some kind of decomposition exception so that c is never decomposed by adding CCO before or after it. This would not remove CCO between two combining characters, so, if 0<c1<c2, <c1, c2> and <c1, CCO, c2> would remain not canonically equivalent while logically equivalent. In practice this would be a small price to pay as it is relevant only in the almost unique case of two vowels on one consonant which actually happen to be in canonical order.



Why that?


As CCO is not defined in any past versions, the stability pact does
not say that we must forbid its _removal_ when computing NFC or NFD
or NFKC or NFKD forms. It just says that we must _not insert_ it in a
source string <c1, c2> where c1 and c2 are already assigned.

So we are fine: we can define a canonical equivalence between
<c1, CCO, c2> and <c1, c2> where the later is simultaneously in
NFC, NFD, NFKC and NFKD forms, for all (c1, c2) pair such that
CC(c1)<=CC(c2) or CC(c2)=0.

But we cannot define it within the UCD, but algorithmically, like for
Hangul syllables/jamos...



My point here was that we might be able to do this within the existing normalisation algorithm, or with a minor change to add decomposition exclusions. I am not sure that I want to push a major change to normalisation to support three character canonical equivalence, and I would predict that we would find it hard to get it through the UTC for such a marginal case. My simple two character composition <c, CCO> => c and <CCO, c> => c, where cc(c)=0, is adequate for removal of the vast majority of superfluous CCO's. And the only real issue is to ensure that this composition is not reversed on decomposition.

--
Peter Kirk
[EMAIL PROTECTED] (personal)
[EMAIL PROTECTED] (work)
http://www.qaya.org/





Reply via email to