Graham Dumpleton wrote: > 2009/4/2 Guido van Rossum <gu...@python.org>: > > On Wed, Apr 1, 2009 at 12:15 PM, Ian Bicking <i...@colorstudy.com> > wrote: > >> On Wed, Apr 1, 2009 at 11:34 AM, Guido van Rossum <gu...@python.org> > wrote: > >>> On Wed, Apr 1, 2009 at 5:18 AM, Robert Brewer <fuman...@aminus.org> > wrote: > >>>> Good timing. We had been thinking to make everything strings > except for > >>>> SCRIPT_NAME, PATH_INFO, and QUERY_STRING, since these few are > pulled > >>>> from the Request-URI, which may be in any encoding. It was thought > that > >>>> the app would be best-qualified to decode those three. > >>> > >>> Argh. The *meaning* of these fields is clearly text. It would be > most > >>> unfortunately if all apps were required to deal with decoding bytes > >>> for these (there is no choice any more, unlike in 2.x). I > appreciate > >>> the sentiment that the encoding is unknown, but I would much prefer > it > >>> if there was a default encoding that the app could override, or if > >>> there was some other mechanism whereby the app would not have to be > >>> bothered with decoding bytes unless it cared. > >> > >> This might be fine, except it is hard. You can't just take > arbitrary > >> bytes and do script_name.decode('utf8'), and then when you realize > you > >> had it wrong do script_name.encode('utf8').decode('latin1'). > > > > Well you could make the bytes versions available under different > keys. > > I think you do something a bit similar this in webob, e.g. req.params > > vs. req.str_params. (Perhaps you could have QUERY_STRING and > > QUERY_BYTES.) The decode() call used to create the text strings could > > use 'replace' as the error handler and the app could check for the > > presence of the replacement character ('\ufffd') in the string to see > > if there was a problem; or it could just work with the string > > containing that character and report the user some kind of 40x or 50x > > error. Frameworks (like webob) would of course do the right thing and > > look for QUERY_BYTES before QUERY_STRING. QUERY_BYTES should probably > > be optional. > > Can we please not invent new names at global context in WSGI > environment dictionary, especially ones that mutate existing names > rather than using a prefix or suffix. > > If we are going to carry values in two different formats, then use the > 'wsgi' name space. Thus, for byte versions of values perhaps use: > > wsgi.request_uri > wsgi.script_name > wsgi.path_info > wsgi.query_string > etc > > In other words, leave all the existing CGI variables to come through > as latin-1 decode and do anything new in 'wsgi' variable namespace, > identifying only the minimal set which needs to be made available as > bytes.
Some thoughts: 1. If we always decode as Latin-1 it should be lossless, and consumers could retrieve the original bytes with val.decode('Latin-1'), thus removing the need for separate entries. 2. CGI says, "REMOTE_USER = *OCTET" :( 3. Bikeshed: "wsgi.xyz" is too close to "XYZ" in my opinion. Robert Brewer fuman...@aminus.org _______________________________________________ Web-SIG mailing list Web-SIG@python.org Web SIG: http://www.python.org/sigs/web-sig Unsubscribe: http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com