> I think, for a script representing usually one language, > allkeys.txt defines fairly acceptable collation order. > For example, order of hiragana and katakana is approximately > compliant with the custom of the Japanese language. > > In contrast, for a script representing many languages > (say, the Latin script), tailoring may be often necessary. > > E.g. 'Ä' is sorted as A-umlaut (sometimes as 'AE') in German, > and as one of additional letters ordered after 'Z' in some > northern-european languages.
Yup, that is the case in Finnish and Swedish, and Danish and Norwegian do similar things with their "a" and "o" equivalents. This means it is logically impossible to sort a list containing both German and Swedish names "right". Many European languages sort some consonant+h after the base consonant as a separate "letter", and so forth. And I believe many the CJK languages have in fact several (and differing) customary sorting sorters. Even when staying within a single language one must decide whether one does things like "dictionary sorting" (spaces etc. removed), and how do lowercase and uppercase sort (A < B < a, A < a < B, a < A < B, or a == A < B), what one does with things like articles, etc. So one must always either accept "a good enough" sorting, or one must customize more or less heavily. > But according to Unicode default collation, 'Ä' is ordered > as a modified 'A' and equal to 'A' at the primary level. > -- Jarkko Hietaniemi <[EMAIL PROTECTED]> http://www.iki.fi/jhi/ "There is this special biologist word we use for 'stable'. It is 'dead'." -- Jack Cohen