On Tue, May 5, 2009 at 10:14 PM, Graham Dumpleton < graham.dumple...@gmail.com> wrote:
> 2009/5/6 Ian Bicking <i...@colorstudy.com>: > > Philip Jenvey brought this to my attention: > > > > http://www.python.org/dev/peps/pep-0383/ > > > > It's a UTF8 encoding and decoding scheme that encodes illegal bytes in > such > > a way that you can decode to get the original bytes object, and thus > > transcode to another encoding. It's intended for cases exactly like > WSGI. > > Care to explain then how that would in practice be used while I try > and reread it a few times to try and understand it myself? :-) > I don't particularly know, except I think you'd do things like: environ['PATH_INFO'] = urllib.unquote(http_byte_path).decode('utf8', 'python-escape') Then if the encoding was wrong, you could transcode like: environ['PATH_INFO'] = environ['PATH_INFO'].encode('utf8', 'python-escape').decode('latin1', 'python-escape') Note that you need to know the encoding that was used (utf8 in this case) and that python-escape was used. It has been suggested that the server should put the encoding it used into the environment. When transcoding this should also be updated. It's not clear what python-escape is going to do, I don't think that's been determined. Probably it'll put \x00 or something in the unicode string to mark raw bytes. -- Ian Bicking | http://blog.ianbicking.org
_______________________________________________ Web-SIG mailing list Web-SIG@python.org Web SIG: http://www.python.org/sigs/web-sig Unsubscribe: http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com