New submission from Matt Giuca <[email protected]>:
urllib.unquote fails to decode a percent-escape with mixed case. To demonstrate:
>>> unquote("%fc")
'\xfc'
>>> unquote("%FC")
'\xfc'
>>> unquote("%Fc")
'%Fc'
>>> unquote("%fC")
'%fC'
Expected behaviour:
>>> unquote("%Fc")
'\xfc'
>>> unquote("%fC")
'\xfc'
I actually fixed this bug in Python 3, at Guido's request as part of the huge
fix to issue 3300. To quote Guido:
> # Maps lowercase and uppercase variants (but not mixed case).
> That sounds like a disaster. Why would %aa and %AA be correct but
> not %aA and %Aa? (Even though the old code had the same problem.)
(Indeed, the RFC 3986 allows mixed-case percent escapes.)
I have attached a patch which fixes it simply by removing the dict mapping all
lower and uppercase variants to characters, and simply calling int(item[:2],
16). It's slower, but correct. This is the same solution we used in Python 3.
I've also backported a number of test cases from Python 3 which cover this
issue, and also legitimate bad percent encoding.
Note: I've also backported the remainder of the 'unquote' test cases from
Python 3 but I found another bug, so I will report that separately, with a
patch.
----------
components: Library (Lib)
files: urllib-unquote-mixcase.patch
keywords: patch
messages: 101044
nosy: mgiuca
severity: normal
status: open
title: urllib.unquote doesn't decode mixed-case percent escapes
type: behavior
versions: Python 2.6, Python 2.7
Added file: http://bugs.python.org/file16540/urllib-unquote-mixcase.patch
_______________________________________
Python tracker <[email protected]>
<http://bugs.python.org/issue8135>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe:
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com