2009/4/16 Miles Kaufmann <mile...@umich.edu>: > On Sat, Apr 11, 2009 at 8:48 PM, Miles Kaufmann wrote: >> The first issue is that there doesn't seem to be a way to parse >> x-www-form-urlencoded query strings in a character set other than >> UTF-8, for example: >> >> 'premier=un&deuxi%E8me=deux' # latin-1 >> >> The urllib.parse.unquote* functions take encoding and errors >> parameters, but none of the higher-level ones. The solution to me >> seems to be that functions that build on top of >> it--urllib.parse.parse*, cgi.parse*, and the cgi.FieldStorage >> constructor--should grow encoding and errors parameters that they pass >> through to the lower-level functions. >> >> The second issue is that the FieldStorage classes work with text input >> streams. However, with multipart/form-data posts, posted files aren't >> necessarily in the same encoding as form fields, or may be binary and >> not text at all. I would suggest that FieldStorage should be changed >> to take a binary input stream. >> >> [...] > > I'm not quite sure how to interpret the lack of response I've gotten > on this topic. Is it just that there's little interest in the cgi > module? Should I raise this issue on the python-dev list, or just > open a bug report and start submitting patches? > > There's been a lot of discussion recently about bytes vs. str in email > headers and WSGI environ variables, but I haven't been able to find a > substantive discussion on this specific topic. Here are some of the > related quotes I've come across. > > Martin v. Löwis wrote [1]: >> In a CGI application, you shouldn't be using sys.stdin or print(). >> Instead, you should be using sys.stdin.buffer (or sys.stdin.buffer.raw), >> and sys.stdout.buffer.raw. A CGI script essentially does binary IO; >> if you use TextIO, there likely will be bugs (e.g. if you have >> attachments of type application/octet-stream). > > bobince wrote [2]: >> Evan Fosmark wrote: >>> bobince wrote: >>>> So yeah, it's a bug in cgi.py, yet another victim of 2to3 conversion >>>> that hasn't been fixed properly for the new string model. It should >>>> be converting the incoming byte stream to characters before >>>> passing them to urllib. >>>> >>>> Did I mention Python 3.0's libraries (especially web-related >>>> ones) still being rather shonky? :-) >>> >>> Yeah. So far I've noticed huge problems with cgi, urllib, and >>> wsgiref. I hope they get fixed soon. :( >> >> Indeed. Momentum in WEB-SIG seems to have ground to a halt; no-one >> seems to want ownership of the issue. Very disappointing. > > There's also this bug report[3], but it doesn't directly propose the > changes that I have. > > So: does anyone agree, or disagree, that cgi.FieldStorage should be > changed to take byte streams, and many of the cgi and urllib.parse > functions should become encoding-aware, preferably in time for Python > 3.1? The byte-stream change will break compatibility with with Python > 3.0, but I strongly feel that treating POST data as text is wrong and > should not continue to be supported. > > -Miles Kaufmann > > [1]: http://mail.python.org/pipermail/python-dev/2009-April/088727.html > [2]: http://stackoverflow.com/questions/540342/python-3-0-urllib > [3]: http://bugs.python.org/issue4953
Have you read: http://bugs.python.org/issue3300 This was referenced in a prior post here and is likely relevant. A lot of the discussion for that was happening on developers list for Python 3.0. Not sure why someone was taking issue with WEB-SIG list over cgi FieldStorage issues as I don't recollect us having any substantive discussion about it and any problems it has. Graham _______________________________________________ Web-SIG mailing list Web-SIG@python.org Web SIG: http://www.python.org/sigs/web-sig Unsubscribe: http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com