On Tue, 11 Mar 2003, Markus Scherer wrote: > The Unicode Collation Algorithm (UCA) for which allkeys.txt is the > default weight table does treat ZWNJ and a number of other characters as > special. For these, they are completely ignored by the UCA - same as if > you stripped them from the text.
Well, anything that is completely ignored in collation creates problems with deterministic sorting. There are certain words in Persian, with completely different meanings, that only differ in a ZWNJ[1]. Having ZWNJ ignored by default, means they may appear in this or that order, possibly based on the original order of input. I guess this is not what we want for deterministic collation. The desired behavior for ZWNJ, is being treated like punctuations. Ignored in the first levels, but considered at the end. (Personal Note: write something for UTC on this.) roozbeh [1] A good example, is نامهای or نامهای (names of) vs نامهای (a letter). Their only difference in encoding is existence or non-existence of ZWNJs, or its different place in the word.