On Mon, 21 Oct 2013 00:33:58 +0530 Pravin Jain <[email protected]> wrote:
I've taken the liberty of replying to the list. > One observation for Indic scripts. > +U0933 normally comes after +U0939, in dictionary, except for this all > other code points are properly ordered. > similarly in the Gujarati block > +U0AB3 comes after +U0AB9. This is different to any issue of 'logical order'; this point relates to the code values used, rather than the order in which the codepoints of a string are stored. Sorting for human consumption normally uses look up tables for the comparison of characters, and these should handle this issue. However, the order is as for the codepoints in the range U+0933 to U+0939 in the Default Unicode Collation Element Table (DUCET), which is controlled by the Unicode Technical Committee and in the CLDR default and Hindi collation tables, which are controlled by the CLDR technical committee. I am surprised that this has not been corrected - the corresponding codepoint, when it exists, comes in the alphabetical order you describe in the Buddhist Indic scripts. Assuming the current collations are wrong, please raise a ticket at http://unicode.org/cldr/trac/newticket and point to some evidence, e.g. an image of entries in a printed dictionary. It may be worth reporting the issue against DUCET at http://www.unicode.org/reporting.html ; however, it may be argued that this is not a sufficiently egregious error for it to be corrected. If you do report it for DUCET, please reference the CLDR ticket number. Richard.

