Dean, > >> > One normalization script could be used any number of times. Clip, > >> >normalize, sort - repeat as necessary. > >> > >> Multiply that times the number of independent researchers and separate > >> projects... > > > >... and you get a thousand different requirements, each of which > >should be addressed with appropriate levels of programming tools. > > ... that are solved now by a single default process requiring no end user > fiddling.
No they are *not* "solved now by a single default process" -- you don't get a thousand different sort orders out of a single default process. > >What gives you the slightest hope that *every* researcher's > >particular needs for searching and sorting can be baked into > >some *default* collation element weighting table? The whole point > >is to provide a mechanism for people to *tailor* it as they choose > >to meet *different* requirements. > > No, that is not the whole point - Yes it *is* the whole point -- of the Unicode Collation Algorithm. Read the document. It is set up the way it is for a reason, and it is to provide a mechanism for people to *tailor* the default table to meet different requirements. > there is also the point that 90% of our > work, which is done now by simple, default processes, would, all of a > sudden, require custom tailoring. If sorting your data in binary order by code point is sufficient for your work -- since that is what the "simple, default processes" actually do -- then more power to you. Transliterate all your data into Hebrew, using Unicode or ISO 8859-8 or Windows CP 1255 or MacHebrew -- it won't matter, since they all use the same alphabetic order for the 22 letters, anyway. Then sort binary and you're done. If you want to do anything *sophisticated* with your data, they you are going to get involved with normalization and custom tailoring of collations. You're also going to get involved with *other* kinds of manipulations of the data, including lemmatizing and transliterations, in order to get like to sort with like. > >Nobody plans to take away your rights and ability to continue > >doing what you now do, if it works very well for you. Please, > >sir, continue doing what you are doing with your current data. :-) > > It's incredible to me that you and others keep repeating this mantra, > ignoring the fact (repeated for the nth time) that we will all be forced, > in our separate research projects, to deal with MULTIPLE, COMPETING encodings. You will not be "forced" to do anything other than what you are doing currently. I keep repeating it because it apparently bears repeating. --Ken

