Using bytes for all `environ` values is easy to understand on the application side as long as you are aware of the encoding problem. The cost is inconvenience, but that's probably OK. It's also simpler to implement on the gateway/server side.
By choosing bytes, WSGI passes the encoding problem to the application, which is good. Let's the application deal with that. It's more likely to know what it needs, and what problem it can ignore. I think that 99% of the time, applications will just decode bytes to string using UTF-8, ignoring invalid values. However it's likely that we'll see middlewares converting ALL environment values to UTF-8, because it's more convienient than using bytes. And some middlewares might depend on `environ` values being string instead of bytes, because it's convenient too. This issue was already raised by Graham. And I think it's important to make it clear. I believe that 'server/CGI' values in the environment shouldn't be modified--Of course it should still be possible to add new values. This way the stack will always remain in a 'sane' state. For example if a middleware wants to convert environ values to UTF-8, it shouldn't do that: > for key, value in environ.items(): > environ[key] = str(value) But something like this--assuming there's only bytes in `environ`: > environ['unicode.environ'] = dict((key, str(value, encoding='utf8')) > for key, value in environ.items()) I'm in favor of using bytes everywhere. But it's important to document why bytes are used and how to use them. I'm not sure this should be included in a PEP, maybe a "WSGI best practices"? Cheers, -- Henry Pr?cheur
_______________________________________________ Web-SIG mailing list Web-SIG@python.org Web SIG: http://www.python.org/sigs/web-sig Unsubscribe: http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com