On Tue, Mar 29, 2011 at 22:40, Lennart Regebro <[email protected]> wrote:
> The lesson here seems to be "if you have to use blacklists, and you
> use unicode strings for those blacklists, also make sure the string
> you compare with doesn't have surrogates".
>
For that matter, what happens with combining characters?
'\N{LATIN SMALL LETTER O}\N{COMBINING DIAERESIS}' != '\N{LATIN SMALL
LETTER O WITH DIAERESIS}'
I guess the filesystem shouldn't treat these as the same (even though
they are), but what if some webservice does? I suspect you should
normalize both strings before comparing them in any blacklist, and
what happens with surrogates when you normalize?
//Lennart
_______________________________________________
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe:
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com