From: "E. Keown" <[EMAIL PROTECTED]> > Thank you Philippe for taking the time to explain. I > originally wanted to be a digital lexicographer, so I > am interested in perfect collation.
Pas de quoi! I hope I have been useful to explain the basic concepts. In fact the Unicode algorithm for collation is a bit more more complex, because it takes into accounts more subtles features needed to cover various languages. My examples were very simplified face to what you can do with Unicode collation. > I assume that Philippe's 'DUCET' and Michael Everson's > "default template" refer to the same item. And > Unicode-compliant software will support DUCET. "DUCET" is referenced in the Unicode standard documenting collation. It's a prebuilt table of collation "weigths" (the term used to designate the comparable numeric values that allows matching and ordering characters and strings) computed according to what is really a standardized (but tailorable) default collation order, and some arbitrary numeric thresholds and arbitrary "gap" values (to simplify some implementations of tailoring, without requiring renumbering of weights in case of insertions). A fully Unicode-compliant collation algorithm implementing the DUCET is not required to use the same weights, but just to keep their relative order and composition. The introductory message described what could be done, but the UTS document describes things with more details.

