Why? The point is that:
<patah, CGJ, hiriq>
is one thing, and
<hiriq, CGJ, patah>
is another. You *want* those sequences to be distinct, right? Even if the text has been normalized, right? That was the whole problem with:
<patah, hiriq> <hiriq, patah>
which are canonically equivalent, since they both normalize to:
<hiriq, patah>
So the CGJ *is* significant for searching (and sorting). If you want one sequence, you search for <patah, CGJ, hiriq>, if you want the other, you search for <hiriq, CGJ, patah>. If you don't care, and want to find either, *then* you strip out the CGJ and normalize before comparison.
I think Peter's point may be that scholar searching for patah followed by hiriq are most likely to search for <patah, hiriq>, and frankly who can blame them? This is what they see in the printed text, and it is what, hopefully, they would be able to input. So again we're looking at a solution that is only as attractive as the ability to hide it from users.
I am working on some exhaustive documentation of the normalisation problems affecting Hebrew mark ordering, which will ensure that we have a good grasp of the extent of the problem and a clear view of all the permutations that need to be taken into account by any solution.
John Hudson
Tiro Typeworks www.tiro.com Vancouver, BC [EMAIL PROTECTED]
If you browse in the shelves that, in American bookstores,
are labeled New Age, you can find there even Saint Augustine,
who, as far as I know, was not a fascist. But combining Saint
Augustine and Stonehenge -- that is a symptom of Ur-Fascism.
- Umberto Eco
