Peter Kirk noted: > > PS Multi-language bibliographies are common in Russian books. They are > > usually printed with the Latin script entries following the Cyrillic > > script ones, but I have seen interleaved ones.
Chris Jacobs noted: > has an index in which greek and latin script are interleaved. > > The greek words are sorted according to their transliteration: > > ̔ sorts as h > φ sorts as ph These illustrate the typical situation with cross-script, cross-language interfiling: They are *custom* solutions for particular indexing problems. And they may involve issues of transliteration or other adaptation to make like match with like for the purposes of the people using the interfiled list. Such tasks should *not* be attributed to the default collation element table for the Unicode Collation Algorithm. It is just inappropriate design, failing to separate functions into appropriate layers. Throwing too many requirements at the default table has at least two bad results: A. It makes the table itself more complex, which means that *all* implementations that deal with it have to deal with additional complexity -- complexity that is often irrelevant except to the barest minority of specialized users of sorting. B. It makes it more difficult to figure out how to tailor and customize the base tables and their behavior for those instances where something really specialized actually *is* needed (such as the Greek and Latin index cited above). It is the same kind of error, in my opinion, as designing a language parser, for example, and then requiring that it handle character input in any encoding. If that task is attributed to the *lexer* itself, you end up with an unholy mess. The correct design is to use a correctly architected character set conversion module, convert all the input into Unicode, and design the lexer to handle Unicode character input. Mike Ayers is on the right track here, I believe. The scenarios which people are adducing in arguing for interfiling should be addressed instead by appropriately designed normalizations -- which can be implemented using fairly easy-to-program, reusable scripts. Then sort on the *normalized* data using a much, much simpler collation table to accomplish what you need. People expecting to import their particular normalization needs *into* the default collation element table, expecting thereby to get "for free" the behavior they want right off the shelf in Windows sorting API's, are, in effect, doing harm to all users of the UCA, without actually buying themselves the flexibility that they need to accomplish what they need to do in the end, anyway. --Ken

