2009/8/4 Ian Bicking <i...@colorstudy.com>: > So... what about WSGI 2? Let's not completely drop the ball on this. > I *think* we were largely in agreement; debate got distracted by some > async stuff, but I don't think we particularly have to deal with that > for WSGI 2. I think we do more than enough if we figure out: WSGI in > Python 3, i.e., with unicode; some basic errata kind of stuff, like > readline signature; change the callable signature to remove > start_response. > > Would this be a new PEP or a revision? I think it should be a new > PEP, as WSGI 1 remains valid and the same as it always was, and PEP > 333 describes that. Is there anyone willing to make the revisions?
But is the intention to skip straight to WSGI 2.0 for Python 3.0, with start_response() being eliminated, or are we going to provide amended WSGI 1.0 for Python 3.0? I can't see how we can avoid the latter and so we should focus on that first rather that more fundamental changes in WSGI 2.0. In respect of WSGI 1.0 for Python 3.0, I have pretty well come to the conclusion that where we were heading before on that in one area is wrong. I was about to make changes to mod_wsgi in line with what I believe should be done and just release it without consultation given that I couldn't see any discussion reaching any conclusion about it soon. Since you have sent this email I will try one last time to get a resolution on WSGI 1.0 for Python 3.0. If can't get one, I guess the choices are to release the change anyway and provide an incompatible implementation to what others are guessing should be done, or just rip all the code out and not support Python 3.0 at all. Either seem entirely reasonable since there is no WSGI 1.0 specification for Python 3.0 and the issue again looks to be getting avoided by skipping to a discussion on WSGI 2.0 instead. So, for WSGI 1.0 style of interface and Python 3.0, the following is what I was going to implement. 1. When running under Python 3, applications SHOULD produce bytes output, status line and headers. This is effectively what we had before. The only difference is that clarify that the 'status line' values should also be bytes. This wasn't noted before. I had already updated the proposed WSGI 1.0 amendments page to mention this. 2. When running under Python 3, servers and gateways MUST accept strings for output, status line and headers. Such strings must be converted to bytes output using 'latin-1'. If string cannot be converted then is treated as an error. This is again what we had before except that mention 'status line' value. 3. When running under Python 3, servers MUST provide wsgi.input as a binary (byte) input stream. No change here. 4. When running under Python 3, servers MUST provide a text stream for wsgi.errors. In converting this to a byte stream for writing to a file, the default encoding would be applied. No real change here except to clarify that default encoding would apply. Use of default encoding though could be problematic if combining different WSGI components. This is because each WSGI component may have been developed on system with different default encoding and so one may expect to log characters that can't be written on a different setup. Not sure how you could solve that except to say people have default encoding be UTF-8 for portability. 5. When running under Python 3, servers MUST provide CGI HTTP and server variables as strings. Where such values are sourced from a byte string, be that a Python byte string or C string, they should be converted as 'UTF-8'. If a specific web server infrastructure is able to support different encodings, then the WSGI adapter MAY provide a way for a user of the WSGI adapter to customise on a global basis, or on a per value basis what encoding is used, but this is entirely optional. Note that there is no requirement to deal with RFC 2047. This is where I am going to diverge from what has been discussed before. The reason I am going to pass as UTF-8 and not latin-1 is that it looks like Apache effectively only supports use of UTF-8. Since this means that mod_wsgi, and Apache modules for FASTCGI, SCGI, AJP and even CGI likely cannot handle anything besides UTF-8 then I really can't see the point of trying to cater for a theoretical possibility that some HTTP client could use something besides UTF-8. In other words, the predominant case will be UTF-8, so let us target that. So, rather than burden every WSGI application with the need to convert from latin-1 back to bytes and then to UTF-8, let the server deal with it, with server using sensible default, and where server infrastructure can handle a different encoding, then it can provide option to use that encoding and WSGI application doesn't need to change. Now, the reason why Apache can't really handle anything besides UTF-8 relates to how filenames are encoded in the file system. Taking Windows first as it is the more obvious case. What Apache does there is take whatever path it has mapping to a script file, be it constructed partially from what is in Apache configuration and partially from what was supplied in URL from client, and converts it to UCS2 for passing to Windows file system routines. In converting to UCS2, Apache assumes that the path will be UTF-8. This means that the Apache configuration file has to be UTF-8 and that the URL as supplied by the client is UTF-8 as well after any URL character encoding is decoded. End result, can only handle UTF-8. For UNIX systems, Apache doesn't do any conversions of the path, but passes it direct to file system routines. On a Linux system supporting UTF-8 file system paths, then that path also need to be UTF-8 and that again implies that Apache configuration is UTF-8 and client decoded URL used in matching resource is also UTF-8. Again, by association of all the moving parts, must all be UTF-8. Now, what I am talking about here is the file system path constructed from file system location and some leading prefix of URL and which is used to match script file. So for URL, this is the SCRIPT_NAME part where it matches to a file system resource such as a script. Obviously there is going to be some amount of URL left over, ie., PATH_INFO and QUERY_STRING. Also shown though that SCRIPT_NAME part has to be UTF-8 and we would really be entering fantasy land if you were somehow going to cope with some different encoding for PATH_INFO and QUERY_STRING. Instead it is like the GPL, viral in nature. Use of UTF-8 in one particular area means you are effectively bound to use UTF-8 everywhere else. Further example of why UTF-8 reaches into everything is mod_rewrite module for Apache. This allows you to do stuff related to SCRIPT_NAME, PATH_INFO and QUERY_STRING parts of a URL. Already shown that Apache configuration file has to be UTF-8. If URL isn't, then wouldn't be possible to perform matches against non latin-1 characters in a rewrite condition or rule. This is because your match string would be in different encoded form to that in URL and so wouldn't match. Now this is all for Apache. Unless they do strange stuff, I would expect that other web servers such as lighttpd, nginx and Cherokee would also have this UTF-8 dependence all through it. This would potentially leave only pure Python web servers that might be able to handle doing stuff as some other encoding. But although that technically may be possible, should that, given that anyone wanting to use a different encoding is likely to be small or non existent, dictate what should be done for everyone, especially if servers wanting to handle different encodings could provide a configuration option to allow it anyway and thus not burden the WSGI application. In summary, just seems more sane to have stuff in WSGI environment be dealt with as UTF-8. So, can we please address this rather than being distracted by WSGI 2.0. The same issue is going to have to be dealt with for WSGI 2.0 anyway, but working it out now means that we can at least deliver a WSGI 1.0 update for Python 3.0. Graham _______________________________________________ Web-SIG mailing list Web-SIG@python.org Web SIG: http://www.python.org/sigs/web-sig Unsubscribe: http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com