[issue16679] Wrong URL path decoding

2012-12-17 Thread Claude Paroz
Claude Paroz added the comment: Thanks for the explanations (and history). I realize that changing the behaviour is probably not an option. As an example in a framework, we are currently discussing how we will cope with this in Django: https://code.djangoproject.com/ticket/19468 On the

[issue16679] Wrong URL path decoding

2012-12-16 Thread And Clover
And Clover added the comment: WSGI's usage of ISO-8859-1 for all HTTP-byte-originated strings is very much deliberate; we needed a way to preserve the original input bytes whilst still using unicode strings, and at the time surrogateescape was not available. The result is counter-intuitive

[issue16679] Wrong URL path decoding

2012-12-15 Thread Terry J. Reedy
Changes by Terry J. Reedy tjre...@udel.edu: -- nosy: +aclover, pje ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue16679 ___ ___ Python-bugs-list

[issue16679] Wrong URL path decoding

2012-12-15 Thread Phillip J. Eby
Phillip J. Eby added the comment: Wouldn't it be possible to amend PEP ? Sure... except then it would also be necessary to amend PEP , and also all WSGI applications already written that assume this, any time in the last nine years. This is a known and intended consistent property

[issue16679] Wrong URL path decoding

2012-12-14 Thread Claude Paroz
New submission from Claude Paroz: In wsgiref/simple_server.py (WSGIRequestHandler.get_environ), Python 3 is currently populating the env['PATH_INFO'] variable by decoding the URL path, assuming it was encoded with 'iso-8859-1', which appears to be wrong, according to RFC 3986/3987. For

[issue16679] Wrong URL path decoding

2012-12-14 Thread Graham Dumpleton
Graham Dumpleton added the comment: The requirement per PEP is that the original byte string needs to be converted to native string (Unicode) with the ISO-8891-1 encoding. This is to ensure that the original bytes are preserved so that the WSGI application, with its own knowledge of what

[issue16679] Wrong URL path decoding

2012-12-14 Thread Berker Peksag
Changes by Berker Peksag berker.pek...@gmail.com: -- versions: +Python 3.4 -Python 3.5 ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue16679 ___ ___

[issue16679] Wrong URL path decoding

2012-12-14 Thread Claude Paroz
Claude Paroz added the comment: Attached are my proposed changes. Also, I just came across http://bugs.python.org/issue3300, which finally led Python urllib.parse.quote to default to UTF-8 encoding, after a lengthy discussion. -- keywords: +patch Added file:

[issue16679] Wrong URL path decoding

2012-12-14 Thread Graham Dumpleton
Graham Dumpleton added the comment: You can't try UTF-8 and then fall back to ISO-8859-1. PEP requires it always be ISO-8859-1. If an application needs it as something else, it is the web applications job to do it. The relevant part of the PEP is: On Python platforms where the str or

[issue16679] Wrong URL path decoding

2012-12-14 Thread Claude Paroz
Claude Paroz added the comment: I may understand your reasoning when you cannot make any assumptions about the encoding of a series of bytes. I think that the case of PATH_INFO is different, because it should comply with standards, and then you *can* make the assumption that the original path