RE: interleaved ordering (was RE: Phoenician)

Mike Ayers Thu, 13 May 2004 13:33:41 -0700

Title: RE: interleaved ordering (was RE: Phoenician)

> From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]]On
> Behalf Of Dean Snyder
> Sent: Thursday, May 13, 2004 10:36 AM

> Rich Gillam of Language Analysis Systems, Inc. Unicode list
> reader wrote
> at 11:41 AM on Thursday, May 13, 2004:

> ...
> >That's how we got here. The effect it has on sorted lists of words
> >seems pretty uninteresting to me. I can think of two use cases:
> >
> >1. A sorted list of Phoenician words (or words using the Phoenicial
> >script range, in whatever language or script) that mixes encoding
> >conventions-- some words use the Phoenician script range and some use
> >the existing Hebrew range. Same letters, same glyphs, different
> >underlying encoding. You want to hide the difference in underlying
> >encoding from the end user.
> >
> >2. A sorted list of Hebrew words, some in modern Hebrew
> script and some
> >in Paleo-Hebrew (or some other script that uses the
> Phoenician range).
> >Same language, different glyphs.
> >
> >Both are justification for an interleaved sort order,

No. Both are situations where the data should be normalized before sorting. In the first case, convert the data into a single encoding convention. In the second case, convert all the non-Hebrew data to Hebrew. Then sort away.

> > but really, how
> >often will either use case come up?
>
> Well, for just one case, if you're a Dead Sea scroll scholar
> (one of the
> more populated sub-disciplines in Semitic scholarship) all
> the time and
> every day.

You create daily sorts on the same data? Since I doubt that you are expecting new words to show up in there, I think that this must mean that you are sorting different sets of the existing data, yes? For such a case, just resort the prenormalized data.

> >Do you really expect-- in EITHER
> >case-- to have long lists of words that need to be
> mechanically sorted?
>
> Yes.

Normalization makes for faster sorting than interfiling.

> >Do you expect it to happen often enough that hacking together a Perl
> >script to do it once isn't going to get the job done?
>
> Yes.

One normalization script could be used any number of times. Clip, normalize, sort - repeat as necessary.

> >Why is this a
> >burning issue that has to be enshrined in the default UCA sort order?
>
> [Or even a separate encoding for that matter?] Because of what lies
> behind the responses to your questions above.

I see no substance in your answers so far. Please clarify.

/|/|ike

RE: interleaved ordering (was RE: Phoenician)

Reply via email to