On Wed, Sep 23, 2009 at 2:38 PM, P.J. Eby <p...@telecommunity.com> wrote:
> At 08:42 AM 9/23/2009 +0200, Armin Ronacher wrote: > >> > I then propose that we eliminate SCRIPT_NAME and PATH_INFO. Instead we >> > have: >> IMO they should stick around for compatibility with older applications >> and be latin1 encoded on Python 3. But the use is discouraged. >> > > One or the other should be there, not both. If you allow older code to > work, this means it could change the old ones but not the new, leaving a > confused mess for child applications to sort out. This is my strongly-held opinion as well. It's been a struggle to get people to provide accurate SCRIPT_NAMEs, and to represent the idea of SCRIPT_NAME through SCRIPT_NAME (as opposed to a hodge-podge of different patterns, configuration, etc). To provide this information twice would be a big step backwards, allowing for all sorts of weird bugs and inconsistent behavior when the two weren't in sync, and depending on which key is given preference in code. I *wish* SCRIPT_NAME and PATH_INFO had been strictly required in WSGI 1 (they are in CGI, but not WSGI). If they were, we'd see more of environ['PATH_INFO'], which would break fast and obviously, and less environ.get('PATH_INFO', ''). But... too late for that now. The new key should definitely be required. Then code can even do: if 'wsgi.path_info' in environ: path_info = urllib.unquote(environ['wsgi.path_info'] else: path_info = environ.get('PATH_INFO', '') We should also make sure the new validator works on both versions of WSGI, which will make it easier to backport checks like making sure wsgi.path_info is *not* in a WSGI 1 environ. Not directly in response to this email, several people expressed concern that some environments provide only the unquoted path. I think it's not terribly horrible if they fake it by re-quoting the path. In CGI/Python 3 this would be something like: environ['wsgi.script_name'] = urllib.request.quote(os.environ['SCRIPT_NAME'].encode(sys.getdefaultencoding(), 'surrogateescape')) (obviously urllib.request.quote needs to be fixed to work on bytes; though the implementation is also small enough we could show the correct implementation in the spec, and warn implementors not to trust urllib.request.quote to work in Python 3.0-3.1.1) I also believe you can safely reconstruct the real SCRIPT_NAME/PATH_INFO from REQUEST_URI, which is usually available (at least in contexts where this sort of thing is a problem). I am not up to thinking it through right now, as it's not a trivial algorithm, but I'm sure it can be done. Really it's just a question of how much you can avoid brute force, because you could always do: def real_path(request_uri, script_name, path_info): for i in range(request_uri): if urllib.request.unquote(request_uri[:i]) == script_name: return request_uri[:i], request_uri[i:] # Something is messed up, fake it return urllib.request.quote(script_name), urllib.request.quote(path_info) I think you could do better than character-by-character (instead by path segment), and in particular do it faster when %2f doesn't appear in the path at all (the common case). This would be appropriate code for wsgiref. > > If we go about dropping start_response, can we move the app iter to the >> beginning? That would be consistent with the signature of common >> response objects, making it possible to do this: >> >> response = Response(*hello_world(environ)) >> > > When you say "beginning", do you mean the beginning of the return tuple? > That is: > > return ['body here'], '200 OK', [('Header', 'value')] > > I'd be surprised if a lot of response objects had such a signature, since > that's not the order a server would actually output that data in. It'd be more reasonable to change the Response __init__ signature, like: class Response(object): def __init__(self, body_or_wsgi_response, status=None, headers=None): if isinstance(body_or_wsgi_response, tuple): status, headers, body = body_or_wsgi_response else: body = body_or_wsgi_response If you allow an iterator for a body argument, it could be a tuple; but at least WebOb doesn't allow iterators, only str/unicode. (You can give an iterator, but you need to do it with an app_iter keyword argument.) I don't know what Werkzeug or other frameworks allow. > > In general I think doing too many changes at once is harmful >> > > Actually, the reverse is true for standards. Incremental change means more > versions, which goes counter to the point of having a standard in the first > place. Yeah; WSGI 1.1 is just errata, I expect to change very little code. I'd rather make just one change to WSGI 2. And it doesn't seem so hard really. -- Ian Bicking | http://blog.ianbicking.org | http://topplabs.org/civichacker
_______________________________________________ Web-SIG mailing list Web-SIG@python.org Web SIG: http://www.python.org/sigs/web-sig Unsubscribe: http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com