2013/2/10 Richard Wordingham <[email protected]>: > Order is a problem when one has collating elements composed of multiple > characters of different non-zero canonical combining classes. In > practice this could be solved by adding more collating elements, but > in theory the number of combinations to be considered could be > unbounded. The UCA defines the interpretation in terms of the NFD > form, and occasionally it is necessary to reduce strings to NFD form to > determine this interpretation. Only having to consider primary weights > can reduce this problem, but it does not always remove the problem.
It's a good point, but this does not break the UCA algorithm itself, which includes a step at which external preprocessing is possible, even if NFD helps reducing the number of cases (provided that it does not strip some prior differences, i.e. when conversion to NFD is applied *after* the preprocessing step, and in that case the number of cases to handle during the preprocessing will be higher, and implementing this preprocessing may be more complex than expected in some languages). The term "pathological" could aplpy to these cases where a "naive" implementation may in fact break the expectations. How then can a collator become a "conforming" process if it has to differentiate canonically equivalent input strings ?

