Graham Dumpleton wrote: > Robert, do you have any comments on the restricting of response > content to bytes and not allow fallback to conversion per latin-1? > > I heard that in CherryPy WSGI server you are only allowing bytes. What > is your rational for that at the moment?
In Python 2.x, one could easily mix unicode strings and byte strings in the same interface, because they mostly supported the same operations. Not so in Python 3.x--byte strings are missing everything from capitalize() to zfill() [1]. I feel that choosing one type or the other is required in order to avoid mountains of if-statements in middleware (and lots of 'pass' statements if bytes are found). I decided that that single type should be byte strings because I want WSGI middleware and applications to be able to choose what encoding their output is. Passing unicode to the server would require some out-of-band method of telling the server which encoding to use per response, which seemed unacceptable. The down side, already alluded to, is that middleware cannot then call e.g. response.capitalize() or any of a number of other methods without first decoding the response. And it cannot do that reliably unless (again) the encoding which was used to produce bytes is communicated down the stack out of band. The python3 branch of CherryPy is by no means complete. I'd be happy to explore emitting unicode if we could decide on a method whereby apps could inform the server which encoding they want. Middleware which transcoded the response would need a means of overriding that. But of course, that opens a whole new can of worms if something goes wrong, because application authors want control over the error response; if the server is encoding the response, and an error occurs, there would have to be a way to pass control back up the stack to...what? whichever component last set the encoding? That road starts to get complicated very quickly. If some middleware needs to treat the response as unicode, I'd rather emit bytes and somehow return the encoding as part of the response. Perhaps WSGI 2's mythical "return (status, headers, body-iterable, encoding)". Middleware could then decode/transcode as desired. I can't think of a downside to that, other than some lost cycles spent de/encoding, but perhaps there are some I don't yet foresee. Robert Brewer fuman...@aminus.org [1] See http://docs.python.org/dev/py3k/library/stdtypes.html#string-methods
_______________________________________________ Web-SIG mailing list Web-SIG@python.org Web SIG: http://www.python.org/sigs/web-sig Unsubscribe: http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com