On Mon, May 21, 2012 at 4:37 PM, Richard Wordingham < richard.wording...@ntlworld.com> wrote:
> What are the definitions of upper and lower case for the caseFirst > tailoring for the UCA and for LDML? I can't find any obvious > definition. > I am having trouble finding a published definition too. I suggest you submit a CLDR ticket for this. http://unicode.org/cldr/trac/newticket In principle, it's straightforward: Lowercase and uppercase follow Unicode (UCD) case properties. We distinguish an intermediate "mixed case" for titlecase characters and mixed-case contractions. I believe we also distinguish small/normal Kana as lowercase/uppercase. I can dig up the ICU code that computes the collation case bits for a string. I don't know whether CLDR/LDML should require all of the details, but there should at least be informative documentation. When you turn on the case level or use a caseFirst option, these case bits are used before (or instead of) the tertiary weights. When you use "normal" 3-level sorting, the case bits are ignored and only the tertiary weights are used. The tertiary weights themselves are separate, and based on a mix of criteria. Best regards, markus -- Google Internationalization Engineering