On Tue, 11 Mar 2003, Markus Scherer wrote:

> The Unicode Collation Algorithm (UCA) for which allkeys.txt is the
> default weight table does treat ZWNJ and a number of other characters as
> special. For these, they are completely ignored by the UCA - same as if
> you stripped them from the text.

Well, anything that is completely ignored in collation creates problems
with deterministic sorting.  There are certain words in Persian, with
completely different meanings, that only differ in a ZWNJ[1].  Having ZWNJ
ignored by default, means they may appear in this or that order, possibly
based on the original order of input.  I guess this is not what we want 
for deterministic collation. 

The desired behavior for ZWNJ, is being treated like punctuations.  
Ignored in the first levels, but considered at the end. (Personal Note:
write something for UTC on this.)

roozbeh

[1] A good example, is نام‌های or نامهای (names of) vs 
نامه‌ای (a letter). Their only difference in  encoding is 
existence or non-existence of ZWNJs, or its different place in the word.


Reply via email to