Title: RE: interleaved ordering (was RE: Phoenician)

> From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]]On
> Behalf Of Dean Snyder
> Sent: Thursday, May 13, 2004 10:36 AM

> Rich Gillam of Language Analysis Systems, Inc. Unicode list
> reader wrote
> at 11:41 AM on Thursday, May 13, 2004:

> ...
> >That's how we got here.  The effect it has on sorted lists of words
> >seems pretty uninteresting to me.  I can think of two use cases:
> >
> >1. A sorted list of Phoenician words (or words using the Phoenicial
> >script range, in whatever language or script) that mixes encoding
> >conventions-- some words use the Phoenician script range and some use
> >the existing Hebrew range.  Same letters, same glyphs, different
> >underlying encoding.  You want to hide the difference in underlying
> >encoding from the end user.
> >
> >2. A sorted list of Hebrew words, some in modern Hebrew
> script and some
> >in Paleo-Hebrew (or some other script that uses the
> Phoenician range).
> >Same language, different glyphs.
> >
> >Both are justification for an interleaved sort order,

        No.  Both are situations where the data should be normalized before sorting.  In the first case, convert the data into a single encoding convention.  In the second case, convert all the non-Hebrew data to Hebrew.  Then sort away.

> > but really, how
> >often will either use case come up?
>
> Well, for just one case, if you're a Dead Sea scroll scholar
> (one of the
> more populated sub-disciplines in Semitic scholarship) all
> the time and
> every day.

        You create daily sorts on the same data?  Since I doubt that you are expecting new words to show up in there, I think that this must mean that you are sorting different sets of the existing data, yes?  For such a case, just resort the prenormalized data.

> >Do you really expect-- in EITHER
> >case-- to have long lists of words that need to be
> mechanically sorted?
>
> Yes.

        Normalization makes for faster sorting than interfiling.

> >Do you expect it to happen often enough that hacking together a Perl
> >script to do it once isn't going to get the job done?
>
> Yes.

        One normalization script could be used any number of times.  Clip, normalize, sort - repeat as necessary.

> >Why is this a
> >burning issue that has to be enshrined in the default UCA sort order?
>
> [Or even a separate encoding for that matter?] Because of what lies
> behind the responses to your questions above.

        I see no substance in your answers so far.  Please clarify.


/|/|ike

Reply via email to