On 27/10/2003 18:06, Philippe Verdy wrote:
From: "Peter Kirk" <[EMAIL PROTECTED]>
Thanks for the clarification. In principle we might be able to go a
little further: we could define both <c, CCO> and <CCO, c> as
canonically equivalent to c for all c in combining class zero. This
would have to be some kind of decomposition exception so that c is never
decomposed by adding CCO before or after it. This would not remove CCO
between two combining characters, so, if 0<c1<c2, <c1, c2> and <c1, CCO,
c2> would remain not canonically equivalent while logically equivalent.
In practice this would be a small price to pay as it is relevant only in
the almost unique case of two vowels on one consonant which actually
happen to be in canonical order.
Why that?
As CCO is not defined in any past versions, the stability pact does
not say that we must forbid its _removal_ when computing NFC or NFD
or NFKC or NFKD forms. It just says that we must _not insert_ it in a
source string <c1, c2> where c1 and c2 are already assigned.
So we are fine: we can define a canonical equivalence between
<c1, CCO, c2> and <c1, c2> where the later is simultaneously in
NFC, NFD, NFKC and NFKD forms, for all (c1, c2) pair such that
CC(c1)<=CC(c2) or CC(c2)=0.
But we cannot define it within the UCD, but algorithmically, like for
Hangul syllables/jamos...
My point here was that we might be able to do this within the existing
normalisation algorithm, or with a minor change to add decomposition
exclusions. I am not sure that I want to push a major change to
normalisation to support three character canonical equivalence, and I
would predict that we would find it hard to get it through the UTC for
such a marginal case. My simple two character composition <c, CCO> => c
and <CCO, c> => c, where cc(c)=0, is adequate for removal of the vast
majority of superfluous CCO's. And the only real issue is to ensure that
this composition is not reversed on decomposition.
--
Peter Kirk
[EMAIL PROTECTED] (personal)
[EMAIL PROTECTED] (work)
http://www.qaya.org/