On 28/07/2003 19:05, Kenneth Whistler wrote:

...

This is, of course, precisely the desired result -- the CGJ is
ignored for weighting, but its presence prevents the reordering
of the vowels into the undesired sequence by normalization.
And the resultant weighted key weights the vowels in the correct
order.

Tailoring of the collation table could modify any of this, but
the above example is what you get just using the default table.

But it is important that people implementing searching and sorting
for Hebrew understand why and how the CGJ is "ignored" in this
context, in order to get correct results. For example, if you
strip the CGJ and *then* hand the string to the collation weighting
algorithm, normalization will again rearrange the points into
the wrong order for weighting.

--Ken






Thank you, Ken. In this particular case we might want to tailor the collation table so that this CGJ is effectively ignored. But I don't understand this aspect of Unicode well enough to know exactly what can be done.

--
Peter Kirk
[EMAIL PROTECTED]
http://web.onetel.net.uk/~peterkirk/





Reply via email to