> -----Message d'origine----- > De : Doug Ewell [mailto:[EMAIL PROTECTED] > Envoyà : lundi 15 dÃcembre 2003 17:32 > à : Unicode Mailing List > Cc : [EMAIL PROTECTED] > Objet : Re: Case mapping of dotless lowercase letters > > > Philippe Verdy <verdy underscore p at wanadoo dot fr> wrote: > > > I would have expected to find these mappings: > > > > 0130; F; 0069; # LATIN SMALL LETTER DOTLESS I > > -> LATIN SMALL LETTER I > > 0130; T; 0130; # LATIN SMALL LETTER DOTLESS I > > -> LATIN SMALL LETTER DOTLESS I > > > > The rationale being that the locale-neutral mappings would not > > differentiate the "standard" small letter (soft-dotted) i, and the > > "Turkic" small letter dotless i, for the same reason that they do not > > differentiate their uppercase versions; and that the "Turkic" mappings > > should maintain this difference in both lowercase and uppercase pairs > > of letters. > > Turkish and Azeri (and others) can only be cased correctly with > locale-specific mappings. The locale-neutral mappings cannot be > expected to consider U+0069 'i' and U+0130 'i' equivalent, with all the > ambiguities that would bring. As you point out, 'i' and 'i' are quite > different letters.
I agree with your argument related to the difference between dotted and dotless letters, except that the current case mappings make a difference of behavior when comparing uppercase words or lowercase words: a difference is kept in the case mappings for the lowercase words, which is not kept for the case mappings of the uppercase words. The consequence is that two words that compare distinct with case mappings will no longer compare distinct if they are converted to uppercase with the default locale-neutral full mappings (this problem does not occur with the Turkic-specific full case mappings). That's all what I say, and I don't want to reform the case mappings for Turkic languages, just demonstrate a caveat for the default locale-neutral mappings. In practice, I had to add these two mappings in my application because it caused identity problems (with security concerns) with the default (locale-neutral) case mappings (the Turkic case mappings are still there as an option for docs explicitly labelled with "tr" or "az" locales). And the same is true with IDNA or case-insensitive filesystems, which also must be made locale-neutral, and thus need to remove the difference between soft-dotted letters and dotless letters. Are case foldings under the rules of the stability policy? Could there exist an "F" addition to the CaseFolding.txt file for default (locale-neutral) full mappings, and the "T" addition to override it for Turkic languages where dotless lowercase i will be mapped to itself ? __________________________________________________________________ << ella for Spam Control >> has removed Spam messages and set aside Newsletters for me You can use it too - and it's FREE! http://www.ellaforspam.com
<<attachment: winmail.dat>>

