Peter Kirk said: > Anyway, I don't see the main purpose of > collation as producing lists of legible words, but rather as matching in > text and database searches.
Collation is used for both purposes, of course. And there is nothing which requires you to use the same rules for sorting lists as for matching for searches. Just as a search might choose to ignore case, a search can be defined which would ignore specific script differences via a tailored weighting. Thus for instance you could, right now, choose to implement a tailoring of the UCA default tables which would give Syriac letters identical weights as [square] Hebrew letters. You could then turn a search using that collation weighting loose on a corpus of Aramaic data in both Hebrew and Syriac script and get the kind of cross-script matching for identical Aramaic "underlying forms" that you are looking for, I presume. Of course, none of that would be free out of the box from any OS, but with advanced tools like ICU it is not that difficult to create specialized collations along these lines and then use them to implement custom searches. It is a little more difficult to integrate them into off-the-shelf databases, but most databases implement some kind of capability for stored procedures, and you can create indexes off stored key fields that are built using such stored procedures. That should enable arbitrarily defined searching into data stores. > I think that it just might be acceptable to encode > the various ancient Semitic scripts separately if they are unified for > collation. As Michael indicated, separate scripts defined and encoded in the Unicode Standard will, in the default collation table, get separate primary weighting. That is the basic pattern followed in the table, and is the most conservative approach, since it does not presume removal of distinctions for the default. In my opinion, the structure of the collation table should not, however, be the main consideration which goes into determining: A. Whether a particular historic variant of some writing system should be separately encoded. (Meaning does the graphological analysis in the context of character encoding suggest that separate encoding makes more sense than unification with something else already encoded?) B. Whether, given a technical determination in (A) that a separate script encoding is warranted, whether it should be encoded at all. (Meaning is there any actual scholarly need for an encoding of that particular form, or would encoding simply be an exercise in script coverage completeness, without any actual application?) For "Aramaic", it isn't clear to me that we have consensus yet about either of these "shoulds". > But if you are saying that it must be all or nothing, I will > continue to fight on behalf of the users of these scripts for all of > what they want, rather than what you have apparently unilaterally (on > the basis of a book which describes glyph shape differences rather than > the systematic differences which really distinguish scripts) decided > that they ought to want and have written into your Roadmap. Them's fightin' words. Howzabout, as Michael suggested, we simply cool it a little about Aramaic? Ancient forms of Aramaic aren't going to be taken up anytime soon for any consideration for encoding. And the Roadmap cannot be taken as a predetermination of the eventual decisions in this regard, in my opinion. If there is, however, some consensus that Samaritan and Manichaen *do* deserve separate encoding consideration, how about pursuing the furthering of encoding proposals for those as distinct scripts and then come back around later to review the ancient forms once again after some more of the pieces have fallen into place? In the meantime, rather than harumphalating that Aramaic scholars are being confused by the Unicode Roadmap, I think it would serve everyone much better if someone knowledgable about Aramaic scholars' text encoding needs and practices (you and others contributing to this discussion on the Hebrew list in particular?) would write up a "Guide to Best Practices for Aramaic Text Representation Using Unicode" and publish it as a Unicode Technical Note. Then people could refer to and be referred to *that*, instead of puzzling over a bunch of sketchy, possible script encoding assignments on the Roadmap which may or may not represent anything that will ever actually be encoded in this area. --Ken

