Alan Kennedy wrote:
> Hi Graham,
> I think yours is a good solution to the problem.
> [Graham]
> > In other words, leave all the existing CGI variables to come through
> > as latin-1 decode
> As latin-1 or rfc-2047 decoded, to unicode.
> > and do anything new in 'wsgi' variable namespace,
> So the server provides
> "wsgi.server_decoded_SCRIPT_NAME" == u"whatever"
> "wsgi.server_decoded_PATH_INFO" == u"whatever"
> "wsgi.server_decode_charset" == u"utf-8"

I think everyone at the sprint today acquiesced to having
SCRIPT_NAME/PATH_INFO/QUERY_STRING be set in the environ as unicode. The
server can decide (probably subject to configuration). I've implemented
this in the python3 branch of CherryPy and it seems to work brilliantly.
Assuming the server *is* configurable, deployers should be able to
choose Latin-1 if they need to recover the original bytes, without
having to support a separate set of encoded-byte entries.

Side note: wrapping the wsgi.input fp in a DecodingWrapper before
handing it to cgi works great, too. No need to rewrite the cgi module to
support bytes as I feared.

Robert Brewer

