On Mon, 11 Feb 2013 02:45:27 +0100 Philippe Verdy <[email protected]> wrote:
> 2013/2/10 Richard Wordingham <[email protected]>: > The term "pathological" could aplpy to these cases where a "naive" > implementation may in fact break the expectations. How then can a > collator become a "conforming" process if it has to differentiate > canonically equivalent input strings ? There is a UCA collation option, 'normalization' set to 'off', which allows such incorrect operation if strings are not FCD. (Both NFC and NFD strings are FCD.) The UCA and LMDL definitions *still* together falsely claim that omitting normalisation will give the correct result on FCD strings; counter-examples include default collation <U+0F71 TIBETAN VOWEL SIGN AA, U+0F73 TIBETAN VOWEL SIGN II> and Danish (still at CLDR Version 22.1) <U+0061 LATIN SMALL LETTER A, U+00E5 LATIN SMALL LETTER A WITH RING ABOVE>. Richard.

