On 09/17/2010 02:03 PM, Armin Ronacher wrote:
In case we change the spec as Ian mentioned above, I am all for a "wsgi.guessed_encoding" = True flag or something like that.
Yes, I'd like to see that. I believe going with *only* a raw-or-reconstructed path_info, rather than having both path_info and PATH_INFO, is probably best, for the middleware-dupication reasons PJE mentioned.
A more in-depth possibility might be: wsgi.path_accuracy = 0: script_name/path_info have been crudely reconstructed from SCRIPT_NAME/PATH_INFO from an unknown source. Beware! If there is to be backwards compatibility with WSGI1, this would be seen as the 'default value' given a missing path_accuracy. 1: script_name/path_info have been reconstructed, but it is known that path_info is accurate, other than %2F and non-ASCII issues. That is, it's known that the path doesn't come from IIS's broken PATH_INFO, or the IIS error has been detected and compensated for. 2: script_name/path_info have been reconstructed using known-good encodings for the env. The only way in which they may differ from the original request path is that a slash might originally have been a %2F. (This is good enough for the vast majority of applications.) 3: script_name/path_info come directly from the request path without any intervening mangling.
Unless I am mistaken, the same is true for CGI scripts running on Apache2 on Windows.
Yes, it's true of *all* CGI scripts, but also for non-CGI scripts on IIS.
I did some tests a while ago and was pretty sure that Apache2 on Windows did the same.
Apache-on-Windows puts the bytes of the decoded path into the environment variables as one code unit per byte: that is, as if encoded by ISO-8859-1. You still have to read the environ using ctypes because mbcs is never ISO-8859-1, but at least the original bytes are recoverable, which isn't the case with IIS.
The correct place for these hacks would be the appropriate WSGI/Web3 handler of the webserver.
The IIS PATH_INFO-prefix hack would be appropriate to put in an IIS-specific handler; indeed, I believe isapi_wsgi does just that. But the other hacks are specific to CGI.
For CGI, there is no 'handler of the webserver', there is only the standard CGI-to-WSGI adapter, so this is the only component it is reasonable to burden with the hacks. Frameworks and libraries further up the stack cannot reliably do the fixups, because they don't know whether the WSGI environ they have been given comes from os.environ or somewhere else, or whether middleware has played with it.
-- And Clover mailto:a...@doxdesk.com http://www.doxdesk.com/ _______________________________________________ Web-SIG mailing list Web-SIG@python.org Web SIG: http://www.python.org/sigs/web-sig Unsubscribe: http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com