Dean,

> >> >  One normalization script could be used any number of times.  Clip,
> >> >normalize, sort - repeat as necessary.
> >> 
> >> Multiply that times the number of independent researchers and separate
> >> projects...
> >
> >... and you get a thousand different requirements, each of which
> >should be addressed with appropriate levels of programming tools.
> 
> ... that are solved now by a single default process requiring no end user
> fiddling.

No they are *not* "solved now by a single default process" -- you
don't get a thousand different sort orders out of a single
default process.

> >What gives you the slightest hope that *every* researcher's
> >particular needs for searching and sorting can be baked into
> >some *default* collation element weighting table? The whole point
> >is to provide a mechanism for people to *tailor* it as they choose
> >to meet *different* requirements.
> 
> No, that is not the whole point - 

Yes it *is* the whole point -- of the Unicode Collation Algorithm.
Read the document. It is set up the way it is for a reason, and
it is to provide a mechanism for people to *tailor* the default
table to meet different requirements.

> there is also the point that 90% of our
> work, which is done now by simple, default processes, would, all of a
> sudden, require custom tailoring.

If sorting your data in binary order by code point is sufficient
for your work -- since that is what the "simple, default processes"
actually do -- then more power to you. Transliterate all your
data into Hebrew, using Unicode or ISO 8859-8 or Windows CP 1255
or MacHebrew -- it won't matter, since they all use the same
alphabetic order for the 22 letters, anyway. Then sort binary
and you're done.

If you want to do anything *sophisticated* with your data, they
you are going to get involved with normalization and custom
tailoring of collations. You're also going to get involved with
*other* kinds of manipulations of the data, including lemmatizing
and transliterations, in order to get like to sort with like.

> >Nobody plans to take away your rights and ability to continue
> >doing what you now do, if it works very well for you. Please,
> >sir, continue doing what you are doing with your current data. :-)
> 
> It's incredible to me that you and others keep repeating this mantra,
> ignoring the fact (repeated for the nth time) that we will all be forced,
> in our separate research projects, to deal with MULTIPLE, COMPETING encodings.

You will not be "forced" to do anything other than what you are
doing currently. I keep repeating it because it apparently
bears repeating.

--Ken


Reply via email to