On Fri, 2010-07-16 at 20:46 -0500, Ian Bicking wrote: > On Fri, Jul 16, 2010 at 6:20 PM, Chris McDonough <chr...@plope.com> > wrote: > > What are the concrete problems you envision with text > request headers, > > text (URL-quoted) path, and text response status and > headers? > > > Documentation is the main reason. For example, the > documentation for > making sense of path_info segments in a WSGI that used > unicodey-strings > would, as I understand it, read something like this: > > Nah, not nearly that hard: > > path_info = > urllib.parse.unquote_to_bytes(environ['wsgi.raw_path_info']).decode('UTF-8') > > I don't see the problem? If you want to distinguish %2f from /, then > you'll do it slightly differently, like: > > path_parts = [ > urllib.parse.unquote_to_bytes(p).decode('UTF-8') > for p in environ['wsgi.raw_path_info'].split('/')] > > This second recipe is impossible to do currently with WSGI. > > So... before jumping to conclusions, what's the hard part with using > text?
It's extremely hard to swallow Python 3's current disregard for the primacy of bytes at I/O boundaries. I'm trying, but I can't help but feel that the existence of an API like "unquote_to_bytes" is more symptom treatment than solution. Of course something that unquotes a URL segment unquotes it into bytes; it's the only sane default because URL segments found in URLs on the internet are bytes. So I guess the "hard part" is more meta. When you have legitimate backwards compatibility constraints, suboptimal choices made during protocol design are excusable. But it just seems really very weird to design one (WSGI 2) from scratch with such choices when the only reason to do so is a systematic low-level denial of reality. Why would we use (and, worse, by doing so, implicitly promote) such a system in the first place? On the other hand, indignance about the issue shouldn't rule the day either. To me, the most pragmatic thing to do that doesn't deny reality would be to use bytes. It's also the easiest thing to remember (the values in the environment are all bytes) and I think we'll be able to drive the Py3K stdlib forward in a much saner direction if we choose bytes than if we choose text to represent things that are naturally more bytes-like. - C _______________________________________________ Web-SIG mailing list Web-SIG@python.org Web SIG: http://www.python.org/sigs/web-sig Unsubscribe: http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com