(John Cowan made this basic point, but I can't help making it more forcefully.)
This whole discussion of interleaved sorting has veered off into the ditch. Now maybe I haven't been paying close enough attention (quite possible, as I pretty much lost patience with the whole Phoenician thread a LONG time ago), but I'm pretty sure the whole suggestion to interleave Phoenician and Hebrew in the sort order, with equivalent letters having the same primary weights, was originally floated as a way to bridge the gap between those who say their lives will be made easier by assigning a new set of codes to the Phoenician letters and those who say their lives will be made harder by doing this. If you've got a group of people who say "We don't want this script encoded" (call them A) and another group saying "We do want this script encoded" (call them B), you (generally) have to favor the user community that wants the new encoding (group B). After all, group A can simply choose to ignore the new code range and keep doing things the way they were doing them. The only way group A is hurt by the new encoding is if they have to deal with documents that use the new encoding (say, the rule to use the old encoding isn't universally followed by members of group A, or they occasionally have to work with documents produced by members of group B). If members of group A have to work with documents using both conventions, searching for a particular word won't necessarily work-- if you search using group A's convention, it won't find words encoded using group B's convention. The way around this problem is to use a tailored collation order that treats both conventions as equivalent. (Actually, you might also want to design Phoenician fonts whose CMAP tables map both the new Phoenician range and the existing Hebrew range to the same set of glyphs, so that both group A and group B can use the same fonts.) That's how we got here. The effect it has on sorted lists of words seems pretty uninteresting to me. I can think of two use cases: 1. A sorted list of Phoenician words (or words using the Phoenicial script range, in whatever language or script) that mixes encoding conventions-- some words use the Phoenician script range and some use the existing Hebrew range. Same letters, same glyphs, different underlying encoding. You want to hide the difference in underlying encoding from the end user. 2. A sorted list of Hebrew words, some in modern Hebrew script and some in Paleo-Hebrew (or some other script that uses the Phoenician range). Same language, different glyphs. Both are justification for an interleaved sort order, but really, how often will either use case come up? Do you really expect-- in EITHER case-- to have long lists of words that need to be mechanically sorted? Do you expect it to happen often enough that hacking together a Perl script to do it once isn't going to get the job done? Why is this a burning issue that has to be enshrined in the default UCA sort order? Of course, I could also ask the reverse question: Given that it's a very tiny community of users that's going to give a dang about the Phoenician characters in the first place, would it hurt anyone to put this in the default UCA ordering? [Not Is It The Right Thing To Do, which I've seen a lot of in this discussion, but Who Does It Hurt?] --Rich Gillam Language Analysis Systems, Inc.

