Re: Case mapping of dotless lowercase letters

Doug Ewell Mon, 15 Dec 2003 20:36:51 -0800

Philippe Verdy <verdy underscore p at wanadoo dot fr> wrote:

>> There may be a problem here, but the urgency seems very slight;
>
> I detected it after it produced a security bug (a user record was
> unexpectedly updated on my database...)
> ...
>> and dotless lowercase i in non-Turkic languages.
>
> Wrong here: I have found occurences of dotless lowercase i, used
> instead of soft-dotted lowercase i, as base letters for diacritics
> added above it (it was an accute accent...)


Don't do that.

> There was two sequences which looked apparently identical when
> rendered, and that were distinct after case folding compare check:
>
> (1) LATIN SMALL LETTER I, COMBINING ACCUTE ACCENT
> (2) LATIN SMALL LETTER DOTLESS I, COMBINING ACCUTE ACCENT
>
> but were no more distinct when converted to uppercase in a locale
> neutral environment not using the Turkic rules:
>
> (1') LATIN CAPITAL LETTER I, COMBINING ACCUTE ACCENT
> (2') LATIN CAPITAL LETTER I, COMBINING ACCUTE ACCENT

OK, so you want the default, local-neutral case mapping tables to equate
U+0069 with U+0131, right?

This is close to being a spoofing problem, though.  See TUS 4.0, page
141.

> The string (2) may have been produced to avoid displaying the dot
> with some fonts that don't apply the soft-dotted rule when there's
> an additional diacritic above...

Don't do that.  That's misusing the standard.  The font should be fixed
instead.

> For me, strings (1) and (2) are "equivalent" in non-Turkic locale-
> neutral environments, and should be equal with case-insensitive
> compares, exactly like for (1') and (2'), their uppercase equivalent.

-Doug Ewell
 Fullerton, California
 http://users.adelphia.net/~dewell/

Re: Case mapping of dotless lowercase letters

Reply via email to