Le mardi 29 mars 2011 à 19:23 +0100, Michael Foord a écrit : > Hey all, > > Not sure how real the security risk is here: > > http://blog.omega-prime.co.uk/?p=107 > > Basically he is saying that if you store a list of blacklisted files > with names encoded in big-5 (or some other non-utf8 compatible encoding) > if those names are passed at the command line, or otherwise read in and > decoded from an assumed-utf8 source with surrogate escaping, the > surrogate escape decoded names will not match the properly decoded > blacklisted names.
Yes, if you decode two byte strings from two different encodings, you get different unicode strings. It's not related to surrogateescape (PEP 383). Sorry, '\u4f60\u597d'.encode('big5').decode('latin1') doesn't give you '\u4f60\u597d' but '§A¦n', and it doesn't warn you that latin1 is not big5 (there is no UnicodeEncodeError, even if the error handler is strict). I think that the example has two issues: - security using blacklists doesn't work (it is better to use a whitelist) - if filenames are stored as Big5, they must be decoded from Big5, and so the locale encoding must be Big5 I don't understand the last paragraph: "P.P.S I will further note that you get the same issue even if the blacklist and filename had been in UTF-8, but this time it gets broken from a terminal in the Big5 locale. I didn’t show it this way around because I understand that Python 3 may only have just recently started using the locale to decode argv, rather than being hardcoded to UTF-8." Python filesystem encoding is only hardcoded to UTF-8 on Mac OS X, on other operating systems, it is the locale encoding. Victor _______________________________________________ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com