"Deterministic Sorting" (was Re: ZWNJ & Persian Collation)

Mark Davis Thu, 13 Mar 2003 13:43:30 -0800

I want to point out two things.

1. UCA provides a mechanism for producing a "deterministic" sort (there
called semi-stable). See step 3.10
(http://www.unicode.org/reports/tr10/#Step_3).


2. A "deterministic" sort is actually not needed very often; people confuse
it with a stable sort. See http://www.unicode.org/reports/tr10/#Stability

3. If someone did customize the UCA for numeric sorting, the difference
between 002 and 2 could be a tertiary difference. So even without using
3.10, they would be distinguished at level 3.

Mark
________
[EMAIL PROTECTED]
IBM, MS 50-2/B11, 5600 Cottle Rd, SJ CA 95193
(408) 256-3148
fax: (408) 256-0799

----- Original Message -----
From: "Markus Scherer" <[EMAIL PROTECTED]>
To: "unicode" <[EMAIL PROTECTED]>
Cc: <[EMAIL PROTECTED]>
Sent: Wednesday, March 12, 2003 08:48
Subject: Re: ZWNJ & Persian Collation


> Roozbeh Pournader wrote:
> > Well, anything that is completely ignored in collation creates problems
> > with deterministic sorting.
>
> I don't think you mean "deterministic". UCA is deterministic, it just
sorts many strings as equal.
>
> > There are certain words in Persian, with
> > completely different meanings, that only differ in a ZWNJ[1].  Having
ZWNJ
> > ignored by default, means they may appear in this or that order,
possibly
> > based on the original order of input.  I guess this is not what we want
> > for deterministic collation.
> >
> > The desired behavior for ZWNJ, is being treated like punctuations.
> > Ignored in the first levels, but considered at the end. (Personal Note:
> > write something for UTC on this.)
>
> Possible. I assume that ZWNJ is ignored in UCA because that is the
expected behavior for many other
> languages. Not ignoring ZWNJ is possible with a tailoring that gives it
some non-zero weights.
>
> Note that many languages require tailorings for at least a couple of
characters to follow national
> standards.
>
> markus
>
> --
> Opinions expressed here may not reflect my company's positions unless
otherwise noted.
>
>
>

"Deterministic Sorting" (was Re: ZWNJ & Persian Collation)

Reply via email to