2009/4/2 Guido van Rossum <gu...@python.org>: > On Wed, Apr 1, 2009 at 12:15 PM, Ian Bicking <i...@colorstudy.com> wrote: >> On Wed, Apr 1, 2009 at 11:34 AM, Guido van Rossum <gu...@python.org> wrote: >>> On Wed, Apr 1, 2009 at 5:18 AM, Robert Brewer <fuman...@aminus.org> wrote: >>>> Good timing. We had been thinking to make everything strings except for >>>> SCRIPT_NAME, PATH_INFO, and QUERY_STRING, since these few are pulled >>>> from the Request-URI, which may be in any encoding. It was thought that >>>> the app would be best-qualified to decode those three. >>> >>> Argh. The *meaning* of these fields is clearly text. It would be most >>> unfortunately if all apps were required to deal with decoding bytes >>> for these (there is no choice any more, unlike in 2.x). I appreciate >>> the sentiment that the encoding is unknown, but I would much prefer it >>> if there was a default encoding that the app could override, or if >>> there was some other mechanism whereby the app would not have to be >>> bothered with decoding bytes unless it cared. >> >> This might be fine, except it is hard. You can't just take arbitrary >> bytes and do script_name.decode('utf8'), and then when you realize you >> had it wrong do script_name.encode('utf8').decode('latin1'). > > Well you could make the bytes versions available under different keys. > I think you do something a bit similar this in webob, e.g. req.params > vs. req.str_params. (Perhaps you could have QUERY_STRING and > QUERY_BYTES.) The decode() call used to create the text strings could > use 'replace' as the error handler and the app could check for the > presence of the replacement character ('\ufffd') in the string to see > if there was a problem; or it could just work with the string > containing that character and report the user some kind of 40x or 50x > error. Frameworks (like webob) would of course do the right thing and > look for QUERY_BYTES before QUERY_STRING. QUERY_BYTES should probably > be optional.
Can we please not invent new names at global context in WSGI environment dictionary, especially ones that mutate existing names rather than using a prefix or suffix. If we are going to carry values in two different formats, then use the 'wsgi' name space. Thus, for byte versions of values perhaps use: wsgi.request_uri wsgi.script_name wsgi.path_info wsgi.query_string etc In other words, leave all the existing CGI variables to come through as latin-1 decode and do anything new in 'wsgi' variable namespace, identifying only the minimal set which needs to be made available as bytes. Graham _______________________________________________ Web-SIG mailing list Web-SIG@python.org Web SIG: http://www.python.org/sigs/web-sig Unsubscribe: http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com