John O'Conner wrote:
> I intend on testing this with a few perl scripts later using 
> the db, but
> thought I'd pose the question to see if anyone has a quick answer:
> 
> Is it true that if a character's general category is neither 
> Ll nor Lt,
> then the uppercase character is simply the character itself?

If I were an user of your API, I would expect that any character having an
uppercase mapping (field #12 in unicodedata.txt) would be uppercased,
whether or not it is a lowercase (or titlecase) letter.

Roman numerals (class Nl):
        2170;SMALL ROMAN NUMERAL ONE;Nl;0;L;<compat> 0069;;;1;N;;;2160;;2160
        ...
        217F;SMALL ROMAN NUMERAL ONE THOUSAND;Nl;0;L;<compat>
006D;;;1000;N;;;216F;;216F

Circled letters (class So):
        24D0;CIRCLED LATIN SMALL LETTER A;So;0;L;<circle>
0061;;;;N;;;24B6;;24B6
        ...
        24E9;CIRCLED LATIN SMALL LETTER Z;So;0;L;<circle>
007A;;;;N;;;24CF;;24CF

And even a diacritic mark (class Mn), that changes to a letter (Lu) when
uppercased:
        0345;COMBINING GREEK YPOGEGRAMMENI;Mn;240;NSM;;;;;N;GREEK
NON-SPACING IOTA BELOW;;0399;;0399

(Notice: these examples are from an outdated version of unicodedata.txt.)

_ Marco

Reply via email to