https://bugzilla.wikimedia.org/show_bug.cgi?id=30675

--- Comment #4 from Philippe Verdy <[email protected]> 2011-09-12 01:37:15 UTC 
---
The the CLDR-modified DUCET basically changes only the relative order of
primary weights. But yes it includes some notable differences for things like
currency symbols.
In the CLDR version, the Rupee sign will no longer sort with Latin letters,
meaning that it will no longer be decomposed and that its first primary weight
will now be distinct from the primary weight given to Latin letter R. This also
means that the "first letter" will need to be made different.
To implement the "first letter", what you need is to do it consistantly with
the collation order, so the Rupee sign will need to be changed to use the Rupee
sign itself as the "first letter", instead of latin small letter r.
You can infer the "first letter" from the DUCET, by looking at the first
collation element that has the same primary weight and the smallest weights for
the next levels. But to get a fully ordered list, necessary to make such
determination, you first need to decide what to do with variable elements:
should they all sort with primary weights, or as ignorables. Because this
changes radically the ordered sequence of collation elements and which "first
letter" you'll get (note that variable elements to not interleave in the DUCET,
at least for the first primary weight when they are expansions, but this is not
necessarily the case with locale-specific tailorings).

One example: U+0060 (the ASCII "GRACE ACCENT") has a possible tailored
decomposition as SPACE+COMBINING GRAVE ACCENT, in which case it would sort with
SPACE, with only a secondary difference of accent (then, using an expansion).
In that case, its "first letter" would become the SPACE, and not itself. There
are more complex cases of "variable collation elements" that need special
handling in tailorings, for "Modifier Letters", or for Hebrew and Tibetan
"cantillation marks", or for Braille patterns. For these cases, you must be
extremely careful about how you compute the "first letter", or it will be
completely out of sync of the collation order.

-- 
Configure bugmail: https://bugzilla.wikimedia.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.
You are on the CC list for the bug.

_______________________________________________
Wikibugs-l mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/wikibugs-l

Reply via email to