Philippe Verdy <verdy underscore p at wanadoo dot fr> wrote: >> There may be a problem here, but the urgency seems very slight; > > I detected it after it produced a security bug (a user record was > unexpectedly updated on my database...) > ... >> and dotless lowercase i in non-Turkic languages. > > Wrong here: I have found occurences of dotless lowercase i, used > instead of soft-dotted lowercase i, as base letters for diacritics > added above it (it was an accute accent...)
Don't do that. > There was two sequences which looked apparently identical when > rendered, and that were distinct after case folding compare check: > > (1) LATIN SMALL LETTER I, COMBINING ACCUTE ACCENT > (2) LATIN SMALL LETTER DOTLESS I, COMBINING ACCUTE ACCENT > > but were no more distinct when converted to uppercase in a locale > neutral environment not using the Turkic rules: > > (1') LATIN CAPITAL LETTER I, COMBINING ACCUTE ACCENT > (2') LATIN CAPITAL LETTER I, COMBINING ACCUTE ACCENT OK, so you want the default, local-neutral case mapping tables to equate U+0069 with U+0131, right? This is close to being a spoofing problem, though. See TUS 4.0, page 141. > The string (2) may have been produced to avoid displaying the dot > with some fonts that don't apply the soft-dotted rule when there's > an additional diacritic above... Don't do that. That's misusing the standard. The font should be fixed instead. > For me, strings (1) and (2) are "equivalent" in non-Turkic locale- > neutral environments, and should be equal with case-insensitive > compares, exactly like for (1') and (2'), their uppercase equivalent. -Doug Ewell Fullerton, California http://users.adelphia.net/~dewell/

