On Fri, Jul 16, 2010 at 9:43 PM, Chris McDonough <chr...@plope.com> wrote:
> > Nah, not nearly that hard: > > > > path_info = > > > urllib.parse.unquote_to_bytes(environ['wsgi.raw_path_info']).decode('UTF-8') > > > > I don't see the problem? If you want to distinguish %2f from /, then > > you'll do it slightly differently, like: > > > > path_parts = [ > > urllib.parse.unquote_to_bytes(p).decode('UTF-8') > > for p in environ['wsgi.raw_path_info'].split('/')] > > > > This second recipe is impossible to do currently with WSGI. > > > > So... before jumping to conclusions, what's the hard part with using > > text? > > It's extremely hard to swallow Python 3's current disregard for the > primacy of bytes at I/O boundaries. I'm trying, but I can't help but > feel that the existence of an API like "unquote_to_bytes" is more > symptom treatment than solution. Of course something that unquotes a > URL segment unquotes it into bytes; it's the only sane default because > URL segments found in URLs on the internet are bytes. > Yes, URL quoted strings should decode to bytes, though arguably it is reasonable to also use the very reasonable UTF-8 default that urllib.parse.quote/unquote uses. So it's really just a question of names, should be quote_to_string or quote_to_bytes that name. Which honestly... whatever. So I guess the "hard part" is more meta. When you have legitimate > backwards compatibility constraints, suboptimal choices made during > protocol design are excusable. But it just seems really very weird to > design one (WSGI 2) from scratch with such choices when the only reason > to do so is a systematic low-level denial of reality. Why would we use > (and, worse, by doing so, implicitly promote) such a system in the first > place? > > On the other hand, indignance about the issue shouldn't rule the day > either. To me, the most pragmatic thing to do that doesn't deny reality > would be to use bytes. It's also the easiest thing to remember (the > values in the environment are all bytes) and I think we'll be able to > drive the Py3K stdlib forward in a much saner direction if we choose > bytes than if we choose text to represent things that are naturally more > bytes-like. > I do feel like indignance has played a part here. And in my brief forays into Python 3 I have been frustrated by the over-textification of APIs. But... if a compromise works let's not let those experiences color our choices. So, here's my criteria for resolving this particular Python 3 issue: * We should not lose information from the request. Decoding with UTF-8 (without surrogateescape) would be an example. URL-decoding loses us information currently; which is why I wouldn't be sad to see it go (though if it was only for that reason I wouldn't bother -- the unicode issue just makes it serendipitous). * We shouldn't produce wildly inaccurate strings. E.g., decoding something with Latin1 when it's an implausible encoding. * Encoding/decoding errors should only possibly happen at the application level, or maybe middleware if you are playing around with stuff. Servers specifically should never have them (because they can't gracefully handle them). * We should avoid server configuration with respect to application policy (we've avoided it so far, yay!) * We should support eclectic application layouts, e.g., an application that sometimes serves Latin-1, sometimes UTF-8 (like if the application proxies requests or serves up legacy content/apps). * We should make things as easy to port as possible. Errors in porting should be loud. * As much as possible WSGI should be readable and usable. Maybe most people will use a library, but we also have a lot of libraries that handle WSGI, and it's nice that's been able to happen, so we don't want to make things any harder than they have to be. E.g., clearly we should use text environ keys (luckily we don't have to worry about non-ASCII header names, I guess?) -- Ian Bicking | http://blog.ianbicking.org
_______________________________________________ Web-SIG mailing list Web-SIG@python.org Web SIG: http://www.python.org/sigs/web-sig Unsubscribe: http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com