Re: Biblical Hebrew (U+034F Combining Grapheme Joiner works)

John Hudson Sat, 28 Jun 2003 00:33:33 -0700

At 07:10 PM 6/27/2003, Kenneth Whistler wrote:

Why? The point is that:

<patah, CGJ, hiriq>

is one thing, and

<hiriq, CGJ, patah>

is another. You *want* those sequences to be distinct, right? Even
if the text has been normalized, right? That was the whole
problem with:

   <patah, hiriq>
   <hiriq, patah>

which are canonically equivalent, since they both normalize to:

<hiriq, patah>

So the CGJ *is* significant for searching (and sorting). If you
want one sequence, you search for <patah, CGJ, hiriq>, if you
want the other, you search for <hiriq, CGJ, patah>. If you
don't care, and want to find either, *then* you strip out the
CGJ and normalize before comparison.

I think Peter's point may be that scholar searching for patah followed by hiriq are most likely to search for <patah, hiriq>, and frankly who can blame them? This is what they see in the printed text, and it is what, hopefully, they would be able to input. So again we're looking at a solution that is only as attractive as the ability to hide it from users.

I am working on some exhaustive documentation of the normalisation problems affecting Hebrew mark ordering, which will ensure that we have a good grasp of the extent of the problem and a clear view of all the permutations that need to be taken into account by any solution.

John Hudson

Tiro Typeworks          www.tiro.com
Vancouver, BC           [EMAIL PROTECTED]

If you browse in the shelves that, in American bookstores,
are labeled New Age, you can find there even Saint Augustine,
who, as far as I know, was not a fascist. But combining Saint
Augustine and Stonehenge -- that is a symptom of Ur-Fascism.
                                                            - Umberto Eco

Re: Biblical Hebrew (U+034F Combining Grapheme Joiner works)

Reply via email to