Not a stupid question at all. 

The reason SpecialCasing.txt changes the case mapping
for dotted uppercase I is as follows:

Take any two strings that are *canonically equivalent*.
One in Normal Form C (maximally composed) and one in
Normal Form D (decomposed).  Now map the two strings
to lowercase.  You would still expect the respective
results to be canonically equivalent.  For that to 
hold, the precomposed dotted uppercase I must map
to lowercase as an i with a combining dot above.
That is because the decomposed version will not get
removed the combining dot above when lowercasing an "I"
The latter would have been a viable alternative,
but that is only exercised for Turkish and Azeri,
for which a dot above is also introduced (procomposed)
when uppercasing an "i". See towards the end of
SpecialCasing.txt.

                /kent k


Teri Griopich wrote:

> There is a file named "SpecialCasing.txt" which can be found
> at the following URL:
> http://www.unicode.org/Public/UNIDATA/SpecialCasing.txt
> 
> I quote the following two lines from the file SpecialCasing.
> txt:
> # Preserve canonical equivalence for I with dot. Turkic is
> handled below.
> 0130; 0069 0307; 0130; 0130; # LATIN CAPITAL LETTER I WITH
> DOT ABOVE
> 
> The file "SpecialCasing.txt" says the lowercase letter of U+
> 0130 LATIN CAPITAL LETTER I WITH DOT ABOVE is "0069 0307",
> unless the locale under consideration is Turkish or Azeri.
> 
> However, the Case Mapping Charts (http://www.unicode.org/
> charts/case/) says U+0069 LATIN SMALL LETTER I is the
> lowercase letter of U+0130 LATIN CAPITAL LETTER I WITH DOT
> ABOVE.
> 
> I am confused???

> Thanks in advance,
> 
> Teri


Reply via email to