Re: CaseFirst and CaseLevel Tailorings of UCA and LDML

Markus Scherer Mon, 21 May 2012 17:17:41 -0700

On Mon, May 21, 2012 at 4:37 PM, Richard Wordingham <
richard.wording...@ntlworld.com> wrote:


> What are the definitions of upper and lower case for the caseFirst
> tailoring for the UCA and for LDML?  I can't find any obvious
> definition.
>

I am having trouble finding a published definition too. I suggest you
submit a CLDR ticket for this. http://unicode.org/cldr/trac/newticket

In principle, it's straightforward: Lowercase and uppercase follow Unicode
(UCD) case properties. We distinguish an intermediate "mixed case" for
titlecase characters and mixed-case contractions. I believe we also
distinguish small/normal Kana as lowercase/uppercase. I can dig up the ICU
code that computes the collation case bits for a string.

I don't know whether CLDR/LDML should require all of the details, but there
should at least be informative documentation.

When you turn on the case level or use a caseFirst option, these case bits
are used before (or instead of) the tertiary weights. When you use "normal"
3-level sorting, the case bits are ignored and only the tertiary weights
are used.

The tertiary weights themselves are separate, and based on a mix of
criteria.

Best regards,
markus
-- 
Google Internationalization Engineering

Re: CaseFirst and CaseLevel Tailorings of UCA and LDML

Reply via email to