Re: Sort Order

Kenneth Whistler Thu, 04 Dec 2003 17:14:59 -0800

Mustafa Jabbar inquired:

> Please also inform me about what will be the sorting for Bangla.
> Thanks and regards


The Unicode Standard is *not* a sorting standard -- nor is any
character encoding.

The reason why it might seem to be, on occasion, is that there is
a long history of people fiddling with the exact details of a
character encoding, to attempt to get them in orders so that
dumb binary comparison algorithms will produce the "correct" results
for pairs of strings using that particular encoding.

The general consensus is, however, that it is impossible to
accomplish meaningful linguistic sorting simply by tinkering
around with the character encoding tables. See Section 5.16 of
the standard for a brief discussion of this issue.

For the related collation standard, see, instead:

http://www.unicode.org/reports/tr10/

That is the Unicode Collation Algorithm (UCA). That standard explains
how to accomplish culturally expected sorting and defines an
algorithm and default table to use for it.

That *still* is not the answer for how Bangla will be sorted,
however. One has to make *use* of the Unicode Collation Algorithm
and then tailor the table accordingly until you produce the
results desired.

So the question which should be asked is: Has anyone produced
a UCA-based collation for Bangla, and if so, what behavior
does it have for sorting Bangla data?

See also the discussion of sorting issues for Indic languages
in Cathy Wissink's technical note:

http://www.unicode.org/notes/tn1/

--Ken

Re: Sort Order

Reply via email to