Doug Ewell wrote: > Philippe Verdy <verdy underscore p at wanadoo dot fr> wrote: > > You have not read: I'm not interested in the Turkic case, but in NON > > Turkic languages, exactly with the default rule which: > > - does not differentiate the dotted uppercase I and the undotted > > uppercase I when casefolding them to the SAME soft-dotted lowercase i. > > - but DOES differentiate the soft-dotted lowercase i and the dotless > > lowercase i, despite the uppercase mapping will drop that difference! > > There may be a problem here, but the urgency seems very slight;
I detected it after it produced a security bug (a user record was unexpectedly updated on my database...) > you'll probably never find dotted uppercase I right. > and dotless lowercase i in non-Turkic languages. Wrong here: I have found occurences of dotless lowercase i, used instead of soft-dotted lowercase i, as base letters for diacritics added above it (it was an accute accent...) There was two sequences which looked apparently identical when rendered, and that were distinct after case folding compare check: (1) LATIN SMALL LETTER I, COMBINING ACCUTE ACCENT (2) LATIN SMALL LETTER DOTLESS I, COMBINING ACCUTE ACCENT but were no more distinct when converted to uppercase in a locale neutral environment not using the Turkic rules: (1') LATIN CAPITAL LETTER I, COMBINING ACCUTE ACCENT (2') LATIN CAPITAL LETTER I, COMBINING ACCUTE ACCENT The string (2) may have been produced to avoid displaying the dot with some fonts that don't apply the soft-dotted rule when there's an additional diacritic above... For me, strings (1) and (2) are "equivalent" in non-Turkic locale-neutral environments, and should be equal with case-insensitive compares, exactly like for (1') and (2'), their uppercase equivalent. __________________________________________________________________ << ella for Spam Control >> has removed Spam messages and set aside Newsletters for me You can use it too - and it's FREE! http://www.ellaforspam.com
<<attachment: winmail.dat>>

