On Mon, Dec 8, 2008 at 2:01 PM, M.-A. Lemburg <[EMAIL PROTECTED]> wrote: > On 2008-12-08 21:45, Antoine Pitrou wrote: >> M.-A. Lemburg <mal <at> egenix.com> writes: >>> Such application specific error handlers could then also apply >>> whatever fancy round-trip safe encoding of non-decodable bytes >>> to Unicode escapes, private code points, etc. as seen fit by the >>> application. >> >> I'd argue that such fancy round-trip safe error handler should be provided by >> Python. It's not reasonable to expect application coders to come up with >> their >> own codec variation based on subtle details of the unicode spec. > > Fair enough. We could add some e.g. > > * a round-trip safe escape error handler that uses a Unicode private > code point area which we officially reserve for the Python > interpreter
This would of course alter the behaviour of those private code points, preventing them from round-tripping properly. I don't think round-tripping can be done from an error handler. You need a full codec to do it. A simple option is 8859-1. Or, ya know, bytes. This has long since gotten repetitive.. > * a human readable escape error handler that encodes the problem > bytes to say hex escapes, e.g. gives Andr\xe9 for a Latin-1 > encoded directory name instead of failing Similar to 'รถ'.encode('ascii', 'backslashreplace')? I'm +1 on making that work. > * a warning error handler that replaces the problem cases with > a question mark and issues a warning through the warning > framework I dub thee errors='warnreplace'. -- Adam Olsen, aka Rhamphoryncus _______________________________________________ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com