On Mon, Dec 8, 2008 at 2:01 PM, M.-A. Lemburg <[EMAIL PROTECTED]> wrote:
> On 2008-12-08 21:45, Antoine Pitrou wrote:
>> M.-A. Lemburg <mal <at> egenix.com> writes:
>>> Such application specific error handlers could then also apply
>>> whatever fancy round-trip safe encoding of non-decodable bytes
>>> to Unicode escapes, private code points, etc. as seen fit by the
>>> application.
>>
>> I'd argue that such fancy round-trip safe error handler should be provided by
>> Python. It's not reasonable to expect application coders to come up with 
>> their
>> own codec variation based on subtle details of the unicode spec.
>
> Fair enough. We could add some e.g.
>
>  * a round-trip safe escape error handler that uses a Unicode private
>   code point area which we officially reserve for the Python
>   interpreter

This would of course alter the behaviour of those private code points,
preventing them from round-tripping properly.

I don't think round-tripping can be done from an error handler.  You
need a full codec to do it.  A simple option is 8859-1.  Or, ya know,
bytes.  This has long since gotten repetitive..


>  * a human readable escape error handler that encodes the problem
>   bytes to say hex escapes, e.g. gives Andr\xe9 for a Latin-1
>   encoded directory name instead of failing

Similar to 'รถ'.encode('ascii', 'backslashreplace')?  I'm +1 on making that work.


>  * a warning error handler that replaces the problem cases with
>   a question mark and issues a warning through the warning
>   framework

I dub thee errors='warnreplace'.


-- 
Adam Olsen, aka Rhamphoryncus
_______________________________________________
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Reply via email to