Alan Franzoni wrote: > Hello, > I think I'm missing some piece here. > > I'm trying to register a default error handler for handling exceptions > for preventing encoding/decoding errors (I know how this works and that > making this global is probably not a good practice, but I found this > strange behaviour while writing a proof of concept of how to let Python > work in a more forgiving way). > > What I discovered is that register_error() for "strict" seems to work in > the way I expect for string decoding, not for unicode encoding. > > That's what happens on Mac, Python 2.7.1 from Apple: > > melquiades:tmp alan$ cat minimal_test_encode.py > # -*- coding: utf-8 -*- > > import codecs > > def handle_encode(e): > return ("ASD", e.end) > > codecs.register_error("strict", handle_encode) > > print u"à".encode("ascii") > > melquiades:tmp alan$ python minimal_test_encode.py > Traceback (most recent call last): > File "minimal_test_encode.py", line 10, in <module> > u"à".encode("ascii") > UnicodeEncodeError: 'ascii' codec can't encode character u'\xe0' in > position 0: ordinal not in range(128) > > > OTOH this works properly: > > melquiades:tmp alan$ cat minimal_test_decode.py > # -*- coding: utf-8 -*- > > import codecs > > def handle_decode(e): > return (u"ASD", e.end) > > codecs.register_error("strict", handle_decode) > > print "à".decode("ascii") > > melquiades:tmp alan$ python minimal_test_decode.py > ASDASD > > > What piece am I missing? The doc at > http://docs.python.org/library/codecs.html says " For > encoding /error_handler/ will be called with a UnicodeEncodeError > <http://docs.python.org/library/exceptions.html#exceptions.UnicodeEncodeError> > instance, which contains information about the location of the error.", is > there any reason why the standard "strict" handler cannot be replaced?
The error handling for the standard erorrs "strict", "replace", "ignore", and "xmlcharrefreplace" is hardwired, see function unicode_encode_ucs1 in Lib/unicodeobject.c: if (known_errorHandler==-1) { if ((errors==NULL) || (!strcmp(errors, "strict"))) known_errorHandler = 1; ... switch (known_errorHandler) { case 1: /* strict */ raise_encode_exception(&exc, encoding, unicode, collstart, collend, reason); goto onError; You need another gun to shoot yourself in the foot ;) -- http://mail.python.org/mailman/listinfo/python-list