Mark Hammond wrote:

I don't think Python explicitly converts it - the CRT's ANSI version
of environ is used

Yes, it would be the CRT on Python 2.x. (Python 3.0 on non-NT does a conversion always using UTF-8, if I'm reading convertenviron right.)

so the resulting strings should be encoded using the 'mbcs' encoding.
What mangling do you see?

Correct, it's characters unencodable in mbcs that are lost*. mbcs is never equivalent to UTF-8 (which would allow us to recover characters on IIS) or ISO-8859 (which would allow us to receover characters on Apache-for-Windows) so there's always heavy lossage.

(* - replaced with ? or Windows's attempt to substitute something that looks vaguely like the original character.)

win32api and ctypes would both let you call the Windows API.

Ah! I had considered the win32 extensions but it's a bit of a dependency... I'd forgotten that we get ctypes for free in 2.5.

So we'd be looking at:

    ctypes.windll.kernel32.GetEnvironmentVariableW(u'PATH_INFO', ...)

when CPython 2.5+/NT is detected, right? That increases the number of situations in which we can feasibly recover URIs that are valid UTF-8 sequences (modulo the slash anyway). Doing the actual recovery still requires some server-sniffing though.

What is IIS doing wrong here?

It's not wrong as such. There are three reasonable choices for decoding header values before putting them in a Unicode environment, and the CGI spec, as it knows nothing about Unicode environment variables, fails to specify which:

    1. ISO-8859-1 (which ensures bytes can be recovered)
    2. UTF-8 (since most URIs are effectively UTF-8 today)
    3. Configured system codepage (mbcs)

Apache [with mod_cgi or mod_wsgi] decides on (1). IIS tries for (2), falling back to (3) on invalid sequences. The text concerning Python 3.0 in the WSGI Amendments page could be read as blessing Apache's behaviour.

However wsgiref.simple_server currently also goes for (2), although that probably can't be considered canonical. I'd be interested to know what other WSGI servers do.

And Clover
Web-SIG mailing list
Web SIG:

Reply via email to