On Tue, Mar 29, 2011 at 07:23:25PM +0100, Michael Foord wrote:
> Hey all,
> 
> Not sure how real the security risk is here:
> 
>     http://blog.omega-prime.co.uk/?p=107
> 
> Basically  he is saying that if you store a list of blacklisted files
> with names encoded in big-5 (or some other non-utf8 compatible
> encoding) if those names are passed at the command line, or otherwise
> read in and decoded from an assumed-utf8 source with surrogate
> escaping, the surrogate escape decoded names will not match the
> properly decoded blacklisted names.
> 
The example is correct.  The security risk is real.  However, there's a flaw
in the program and whether the question of whether there's also a flaw in
python is not so certain.

Here's the line I'd say is contentious::
  blacklist = open("blacklist.big5", encoding='big5').read().split()

The blacklist file contains a list of filenames.  However, this code treats
it as a list of strings.  This a logic error in the program, and he should
really be doing this::
  blacklist = open("blacklist.big5", 'rb').read().split()

Then, when comparing it against the values of sys.argv, either sys.argv gets
converted into bytes (using the system locale since that's what was used to
encode to unicode) or the items in blacklist get converted to unicode with
surrogateescape.

The possible flaw in python is this:  Code like the blog poster wrote passes
python3 without an error or a warning.  This gives the programmer no
feedback that they're doing something wrong until it actually bites them in
the foot in deployed code.

-Toshio

Attachment: pgpZiD1gfinFR.pgp
Description: PGP signature

_______________________________________________
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Reply via email to