Henry Precheur wrote: > On Mon, Sep 21, 2009 at 09:14:13PM +0200, Armin Ronacher wrote: > > So the same standard should have different behavior on different > > Python versions? That would make framework code a lot more complicated. > > I don't understand why it would be 'a lot more' complicated. > > (The following code snippets is Python 3 only, and assumes we're using > 'native strings' everywhere) > > In the gateway, environ would be populated this way: > > environ['some_key'] = some_value.decode('utf8', 'surrogateescape') > > Compare that to the utf-8-then-latin-1 alternative: > > try: > environ['some_key'] = some_value.decode('utf-8') > environ['some_key.encoding'] = 'utf-8' > except UnicodeError: > environ['some_key'] = some_value.decode('latin-1') > environ['some_key.encoding'] = 'latin-1' > > > What you would have in the application to get the original value: > > environ['some_key'].encode('utf8', 'surrogateescape') > > With utf8-then-latin1: > > environ['some_key'].encode(environ['some_key.encoding']) > > > The 'surrogateescape' way is clearly simpler.
It looks simpler until you have a site that is not primarily utf-8. In that case, you multiply your (1 line * number of middlewares in the WSGI stack * each request). With wsgi.uri_encoding you get either (1 line * 1 middleware designed to transcode * each request), or even 0 if your whole site uses just one charset. Robert Brewer fuman...@aminus.org _______________________________________________ Web-SIG mailing list Web-SIG@python.org Web SIG: http://www.python.org/sigs/web-sig Unsubscribe: http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com