[issue17828] More informative error handling when encoding and decoding
Nick Coghlan added the comment: Patch for the final version that I'm about to commit. - I realised the exception chaining would only trigger for the encode() and decode() methods, when it was equally applicable to the codecs.encode() and codecs.decode() functions, so I updated the test cases and moved it accordingly. - reworded the What's New text to better clarify the historical confusion around the nature of the codecs module that these changes are intended to rectify (since the intent is clear from the existence of codecs.encode and codecs.decode and their coverage in the test suite since Python 2.4). -- stage: needs patch - commit review Added file: http://bugs.python.org/file32595/issue17828_improved_codec_errors_v7.diff ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue17828 ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue17828] More informative error handling when encoding and decoding
Roundup Robot added the comment: New changeset 854a2cea31b9 by Nick Coghlan in branch 'default': Close #17828: better handling of codec errors http://hg.python.org/cpython/rev/854a2cea31b9 -- nosy: +python-dev resolution: - fixed stage: commit review - committed/rejected status: open - closed ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue17828 ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue17828] More informative error handling when encoding and decoding
Roundup Robot added the comment: New changeset 99ba1772c469 by Christian Heimes in branch 'default': Issue #17828: va_start() must be accompanied by va_end() http://hg.python.org/cpython/rev/99ba1772c469 -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue17828 ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue17828] More informative error handling when encoding and decoding
Roundup Robot added the comment: New changeset 26121ae22016 by Christian Heimes in branch 'default': Issue #17828: _PyObject_GetDictPtr() may return NULL instead of a PyObject** http://hg.python.org/cpython/rev/26121ae22016 -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue17828 ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue17828] More informative error handling when encoding and decoding
Christian Heimes added the comment: Coverity has found two issues in your patch. I have fixed them both. -- nosy: +christian.heimes ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue17828 ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue17828] More informative error handling when encoding and decoding
Nick Coghlan added the comment: Updated patch (v5) with a more robust chaining mechanism provided as a private _PyErr_TrySetFromCause API. This version eliminates the previous whitelist in favour of checking directly for the ability to replace the exception with another instance of the same type without losing information. This version also has more direct tests of the exception wrapping behaviour as a dedicated test class. If I don't hear any objections in the next couple of days, I plan to commit this version. -- Added file: http://bugs.python.org/file32561/issue17828_improved_codec_errors_v5.diff ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue17828 ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue17828] More informative error handling when encoding and decoding
Marc-Andre Lemburg added the comment: On 10.11.2013 14:03, Nick Coghlan wrote: Updated patch (v5) with a more robust chaining mechanism provided as a private _PyErr_TrySetFromCause API. This version eliminates the previous whitelist in favour of checking directly for the ability to replace the exception with another instance of the same type without losing information. This version also has more direct tests of the exception wrapping behaviour as a dedicated test class. If I don't hear any objections in the next couple of days, I plan to commit this version. This doesn't look right: diff -r 1ee45eb6aab9 Include/pyerrors.h --- a/Include/pyerrors.hSat Nov 09 23:15:52 2013 +0200 +++ b/Include/pyerrors.hSun Nov 10 22:54:04 2013 +1000 ... +PyAPI_FUNC(PyObject *) _PyErr_TrySetFromCause( +const char *prefix_format, /* ASCII-encoded string */ +... +); BTW: Why don't we make that API a public one ? It could be useful in C extensions as well. In the error messages, I'd use codecs.encode() and codecs.decode() (ie. with parens) instead of codecs.encode and codecs.decode. -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue17828 ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue17828] More informative error handling when encoding and decoding
Nick Coghlan added the comment: On 10 November 2013 23:21, Marc-Andre Lemburg rep...@bugs.python.org wrote: Marc-Andre Lemburg added the comment: On 10.11.2013 14:03, Nick Coghlan wrote: Updated patch (v5) with a more robust chaining mechanism provided as a private _PyErr_TrySetFromCause API. This version eliminates the previous whitelist in favour of checking directly for the ability to replace the exception with another instance of the same type without losing information. This version also has more direct tests of the exception wrapping behaviour as a dedicated test class. If I don't hear any objections in the next couple of days, I plan to commit this version. This doesn't look right: diff -r 1ee45eb6aab9 Include/pyerrors.h --- a/Include/pyerrors.hSat Nov 09 23:15:52 2013 +0200 +++ b/Include/pyerrors.hSun Nov 10 22:54:04 2013 +1000 ... +PyAPI_FUNC(PyObject *) _PyErr_TrySetFromCause( +const char *prefix_format, /* ASCII-encoded string */ +... +); The signature? That API doesn't currently let you change the exception type, only the message (since the codecs machinery doesn't need to change the exception type, and changing the exception type is fraught with peril from a backwards compatibility point of view). BTW: Why don't we make that API a public one ? It could be useful in C extensions as well. Because I'm not sure it's a good idea in general and hence am wary of promoting it too much at this point in time (especially given the severe limitations of what it can currently wrap). I'm convinced it's worth it in this particular case (since being told the codec involved directly makes the meaning of codec errors much clearer and even with the limitations it can still wrap most errors from standard library codecs), and the implementation *has* to be in exceptions.c since it pokes around comparing the exception details to the internals of BaseException to figure out if it can safely wrap the exception or not. Issue 18861 also makes me wonder if there's an underlying structural problem in the way exception chaining currently works that could be better solved by making it possible to annotate traceback frames while unwinding the stack, which also makes me disinclined to add to the public C API in this area before 3.5. -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue17828 ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue17828] More informative error handling when encoding and decoding
Nick Coghlan added the comment: On 10 November 2013 23:21, Marc-Andre Lemburg rep...@bugs.python.org wrote: This doesn't look right: diff -r 1ee45eb6aab9 Include/pyerrors.h --- a/Include/pyerrors.hSat Nov 09 23:15:52 2013 +0200 +++ b/Include/pyerrors.hSun Nov 10 22:54:04 2013 +1000 ... +PyAPI_FUNC(PyObject *) _PyErr_TrySetFromCause( +const char *prefix_format, /* ASCII-encoded string */ +... +); After sending my previous reply, I realised you may have been referring to the comment. I copied that from the PyErr_Format signature. According to http://docs.python.org/dev/c-api/unicode.html#PyUnicode_FromFormat, the format string still has to be ASCII-encoded, and if that's no longer true, it's a separate bug from this one that will require a docs fix as well. In the error messages, I'd use codecs.encode() and codecs.decode() (ie. with parens) instead of codecs.encode and codecs.decode. Forgot to reply to this part - I like it, will switch it over before committing. -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue17828 ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue17828] More informative error handling when encoding and decoding
Changes by Nick Coghlan ncogh...@gmail.com: Added file: http://bugs.python.org/file32562/issue17828_improved_codec_errors_v6.diff ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue17828 ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue17828] More informative error handling when encoding and decoding
Marc-Andre Lemburg added the comment: On 10.11.2013 15:39, Nick Coghlan wrote: On 10 November 2013 23:21, Marc-Andre Lemburg rep...@bugs.python.org wrote: This doesn't look right: diff -r 1ee45eb6aab9 Include/pyerrors.h --- a/Include/pyerrors.hSat Nov 09 23:15:52 2013 +0200 +++ b/Include/pyerrors.hSun Nov 10 22:54:04 2013 +1000 ... +PyAPI_FUNC(PyObject *) _PyErr_TrySetFromCause( +const char *prefix_format, /* ASCII-encoded string */ +... +); Sorry about the false warning. After looking at those lines again, I realized that the ... is the argument ellipsis, not some omitted code. At first this look like a function definition to me :-) After sending my previous reply, I realised you may have been referring to the comment. I copied that from the PyErr_Format signature. According to http://docs.python.org/dev/c-api/unicode.html#PyUnicode_FromFormat, the format string still has to be ASCII-encoded, and if that's no longer true, it's a separate bug from this one that will require a docs fix as well. Also note that it's not clear whether the ASCII refers to the format string or the resulting formatted string. For the format string, ASCII would probably be fine, but for the formatted string, UTF-8 should be allowed, since it's not uncommon to add e.g. parameter strings that caused the error to the error string. That's a separate ticket, though. In the error messages, I'd use codecs.encode() and codecs.decode() (ie. with parens) instead of codecs.encode and codecs.decode. Forgot to reply to this part - I like it, will switch it over before committing. Thanks. -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue17828 ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue17828] More informative error handling when encoding and decoding
Nick Coghlan added the comment: New and improved implementation attached that extracts the exception chaining to a helper functions and calls it only when it is the call in to the codecs machinery that failed (eliminating the need for the output flag, and covering decoding as well as encoding). TypeError, AttributeError and ValueError are all wrapped with chained exceptions that mention the codec that failed. (Annoyingly, bz2_codec throws OSError instead of ValueError for bad input data, but wrapping OSError safely is a pain due to the extra state potentially carried on instances. So letting it escape unwrapped is the simpler and more conservative option at this point) import codecs codecs.encode(bhello, bz2_codec).decode(bz2_codec) Traceback (most recent call last): File stdin, line 1, in module TypeError: 'bz2_codec' decoder returned 'bytes' instead of 'str'; use codecs.decode to decode to arbitrary types bhello.decode(rot_13) AttributeError: 'memoryview' object has no attribute 'translate' The above exception was the direct cause of the following exception: Traceback (most recent call last): File stdin, line 1, in module AttributeError: decoding with 'rot_13' codec failed (AttributeError: 'memoryview' object has no attribute 'translate') hello.encode(bz2_codec) TypeError: 'str' does not support the buffer interface The above exception was the direct cause of the following exception: Traceback (most recent call last): File stdin, line 1, in module TypeError: encoding with 'bz2_codec' codec failed (TypeError: 'str' does not support the buffer interface) hello.encode(rot_13) Traceback (most recent call last): File stdin, line 1, in module TypeError: 'rot_13' encoder returned 'str' instead of 'bytes'; use codecs.encode to encode to arbitrary types -- Added file: http://bugs.python.org/file32508/issue17828_improved_codec_errors_v3.diff ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue17828 ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue17828] More informative error handling when encoding and decoding
Nick Coghlan added the comment: Checking the other binary-binary and str-str codecs with input type and value restrictions: - they all throw TypeError and get wrapped appropriately when asked to encode str input (rot_13 throws the output type error) - rot_13 throws an appropriately wrapped AttributeError when asked to decode bytes or bytearray object For bad value input, uu_codec is the only one that throws a normal ValueError, I couldn't figure out a way to get quopri_codec to complain about the input value and the others throw a module specific error: binascii (base64_codec, hex_codec) throws binascii.Error (a custom ValueError subclass) zlib (zlib_codec) throws zlib.error (inherits directly from Exception) As with the OSError that escapes from bz2_codec, I think the simplest and most conservative option is to not worry about those at this point. -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue17828 ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue17828] More informative error handling when encoding and decoding
Nick Coghlan added the comment: Updated patch adds systematic tests for the new error handling to test_codecs.TransformTests I also moved the codecs changes up to a Codec handling improvements section. My rationale for doing that is that this is actually a pretty significant usability enhancement and Python 3 codec model clarification for heavy users of binary codecs coming from Python 2, and because I also plan to follow up on this issue by bringing back the shorthand aliases for these codecs that were removed in issue 10807 (thus closing issue 7475). If issue 15216 gets finished (changing stream encodings after creation) that would also be a substantial enhancement worth mentioning here. -- Added file: http://bugs.python.org/file32509/issue17828_improved_codec_errors_v4.diff ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue17828 ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue17828] More informative error handling when encoding and decoding
Nick Coghlan added the comment: Updated patch. The results of this suggests to me that the input wrappers are likely infeasible at this point in time, but improving the errors for the wrong *output* type is entirely feasible. Since the main conversion we need to prompt is things like binary_object.decode(binary_codec) - codecs.decode(binary_object, binary_codec), I suggest we limit the scope of this issue to that part of the problem. import codecs codecs.encode(bhello, bz2_codec).decode(bz2_codec) Traceback (most recent call last): File stdin, line 1, in module TypeError: 'bz2_codec' decoder returned 'bytes' instead of 'str'; use codecs.decode to decode to arbitrary types hello.encode(bz2_codec) TypeError: 'str' does not support the buffer interface The above exception was the direct cause of the following exception: Traceback (most recent call last): File stdin, line 1, in module TypeError: invalid input type for 'bz2_codec' codec (TypeError: 'str' does not support the buffer interface) hello.encode(rot_13) TypeError: 'rot_13' encoder returned 'str' instead of 'bytes'; use codecs.encode to encode to arbitrary types The above exception was the direct cause of the following exception: Traceback (most recent call last): File stdin, line 1, in module TypeError: invalid input type for 'rot_13' codec (TypeError: 'rot_13' encoder returned 'str' instead of 'bytes'; use codecs.encode to encode to arbitrary types) -- Added file: http://bugs.python.org/file32496/issue17828_improved_codec_errors.diff ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue17828 ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue17828] More informative error handling when encoding and decoding
Nick Coghlan added the comment: Ah, came up with a relatively simple solution based on an internal helper function with an optional output flag: import codecs codecs.encode(bhello, bz2_codec).decode(bz2_codec) Traceback (most recent call last): File stdin, line 1, in module TypeError: 'bz2_codec' decoder returned 'bytes' instead of 'str'; use codecs.decode to decode to arbitrary types hello.encode(bz2_codec) TypeError: 'str' does not support the buffer interface The above exception was the direct cause of the following exception: Traceback (most recent call last): File stdin, line 1, in module TypeError: invalid input type for 'bz2_codec' codec (TypeError: 'str' does not support the buffer interface) hello.encode(rot_13) Traceback (most recent call last): File stdin, line 1, in module TypeError: 'rot_13' encoder returned 'str' instead of 'bytes'; use codecs.encode to encode to arbitrary types -- Added file: http://bugs.python.org/file32497/issue17828_improved_codec_errors_v2.diff ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue17828 ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue17828] More informative error handling when encoding and decoding
Nick Coghlan added the comment: The other thing is that this patch doesn't wrap AttributeError. I'm OK with that, since I believe the only codec in the standard library that currently throws that for a bad input type is rot_13. -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue17828 ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue17828] More informative error handling when encoding and decoding
STINNER Victor added the comment: It would be simpler to just drop these custom codecs (rot13, bz2, hex, etc.) instead of helping to use them :-) -- nosy: +haypo ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue17828 ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue17828] More informative error handling when encoding and decoding
Marc-Andre Lemburg added the comment: On 04.11.2013 14:30, STINNER Victor wrote: It would be simpler to just drop these custom codecs (rot13, bz2, hex, etc.) instead of helping to use them :-) -1 for the same reasons I keep repeating over and over and over again :-) The codec system was designed to work obj-obj. Python 3 limits the types for the bytes/str helper methods, but that limitation does not extend to the codec design. +1 on having better error messages. In the long run, we should add supported input/output type information to codecs, so that error reporting and codec introspection becomes easier. -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source (#1, Nov 04 2013) Python Projects, Consulting and Support ... http://www.egenix.com/ mxODBC.Zope/Plone.Database.Adapter ... http://zope.egenix.com/ mxODBC, mxDateTime, mxTextTools ...http://python.egenix.com/ 2013-11-19: Python Meeting Duesseldorf ... 15 days to go : Try our mxODBC.Connect Python Database Interface for free ! :: eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 http://www.egenix.com/company/contact/ -- nosy: +lemburg ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue17828 ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue17828] More informative error handling when encoding and decoding
Nick Coghlan added the comment: I think I figured out a better way to structure this that avoids the need for the output flag and is more easily expanded to whitelist additional exception types as safe to wrap. I'll try to come up with a new patch tonight. -- assignee: - ncoghlan ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue17828 ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue17828] More informative error handling when encoding and decoding
Nick Coghlan added the comment: Ezio pointed out on IRC that the extra type checks in str.encode, bytes.decode and bytearray.decode should reference the appopriate codecs module function in addition to the codec in use. So if str.encode produces something other than bytes, it should reference codecs.encode, while the binary decoding methods should mention codecs.decode if they produce something other than str. -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue17828 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue17828] More informative error handling when encoding and decoding
Ezio Melotti added the comment: The attached patch changes the error message of str.encode/bytes.decode when the codec returns the wrong type: import codecs 'example'.encode('rot_13') TypeError: encoder returned 'str' instead of 'bytes', use codecs.decode for str-str conversions codecs.encode('example', 'rot_13') 'rknzcyr' b'000102'.decode('hex_codec') TypeError: decoder returned 'bytes' instead of 'str', use codecs.encode for bytes-bytes conversions codecs.decode(b'000102', 'hex_codec') b'\x00\x01\x02' This only solves part of the problem though, because individual codecs might raise other errors if the input type is wrong: 'example'.encode('hex_codec') Traceback (most recent call last): File /home/wolf/dev/py/py3k/Lib/encodings/hex_codec.py, line 16, in hex_encode return (binascii.b2a_hex(input), len(input)) TypeError: 'str' does not support the buffer interface -- keywords: +patch Added file: http://bugs.python.org/file30189/issue17828-1.diff ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue17828 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue17828] More informative error handling when encoding and decoding
Ezio Melotti added the comment: To summarize: * str.encode does only str-bytes; * bytes.decode does only bytes- str; * codecs.encode/decode do obj-obj; The things that could go wrong are: 1) the input type is wrong (i.e. the codec doesn't accept the type of the input); 2) the input value is invalid; 3) for str.encode/bytes.decode only, the output type is wrong (i.e. the codec returned a non-bytes/non-str object); My patch only covers 3. The four new exceptions suggested by Nick in msg187704 would cover the first 2 cases. For str.encode/bytes.decode, if we knew the input type accepted by the codec we could also provide a better error message (e.g. codecs accepts '...', not '...'; use ... instead), but we don't. -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue17828 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue17828] More informative error handling when encoding and decoding
Ezio Melotti added the comment: The attached proof of concept catches Type/ValueError in str.encode and raises another exception with a better message: 'example'.encode('hex_codec') Traceback (most recent call last): File stdin, line 1, in module TypeError: invalid input type for hex_codec codec ('str' does not support the buffer interface) (note: the patch doesn't handle the exception chaining yet and probably leaks.) If Nick proposal in msg187704 is accepted, this should become a codecs.EncodeTypeError. The same should then be done for bytes.decode and for codecs.encode/decode. -- Added file: http://bugs.python.org/file30190/issue17828-2.diff ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue17828 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue17828] More informative error handling when encoding and decoding
Nick Coghlan added the comment: I tracked down the proximate cause of the weird exception in the bytes.decode case: the base64 module only accepts bytes and bytearray objects, instead of using memoryview to accept anything that supports the buffer API and provides a C-contiguous 8-bit view of the underlying data. Raised as issue 17839. -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue17828 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue17828] More informative error handling when encoding and decoding
Nick Coghlan added the comment: Here's an example of the specific type errors raised by additional checks in the text-encoding specific methods. I believe the main improvement needed here is to mention the encoding name in the exception message: example.encode(rot_13) Traceback (most recent call last): File stdin, line 1, in module TypeError: encoder did not return a bytes object (type=str) b'BZh91AYSY\xc1uvK\x00\x00\x01F\x80\x00\x10\x00\x04\x00\x00\x10 \x000\xcd\x00\xc1\xa0P\xe2\xeeH\xa7\n\x12\x18.\xae\xc9`'.decode(bz2_codec) Traceback (most recent call last): File stdin, line 1, in module TypeError: decoder did not return a str object (type=bytes) -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue17828 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue17828] More informative error handling when encoding and decoding
Changes by Barry A. Warsaw ba...@python.org: -- nosy: +barry ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue17828 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue17828] More informative error handling when encoding and decoding
New submission from Nick Coghlan: Passing the wrong types to codecs can currently lead to rather confusing exceptions, like: bZXhhbXBsZQ==\n.decode(base64_codec) Traceback (most recent call last): File stdin, line 1, in module File /usr/lib64/python3.2/encodings/base64_codec.py, line 20, in base64_decode return (base64.decodebytes(input), len(input)) File /usr/lib64/python3.2/base64.py, line 359, in decodebytes raise TypeError(expected bytes, not %s % s.__class__.__name__) TypeError: expected bytes, not memoryview codecs.decode(example, utf8) Traceback (most recent call last): File stdin, line 1, in module File /usr/lib64/python3.2/encodings/utf_8.py, line 16, in decode return codecs.utf_8_decode(input, errors, True) TypeError: 'str' does not support the buffer interface This situation could be improved by having the affected APIs use the exception chaining system to wrap these errors in a more informative exception that also display information on the codec involved. Note that UnicodeEncodeError and UnicodeDecodeError are not appropriate, as those are specific to text encoding operations, while these new wrappers will apply to arbitrary codecs, regardless of whether or not they use the unicode error handlers. Furthermore, for backwards compatibility with existing exception handling, it is probably necessary to limit ourselves to specific exception types and ensure that the wrapper exceptions are subclasses of those types. These new wrappers would have __cause__ set to the exception raised by the codec, but emit a message more along the lines of the following: == codecs.DecodeTypeError: encoding='utf8', details=TypeError: 'str' does not support the buffer interface == Wrapping TypeError and ValueError should cover most cases, which would mean four new exception types in the codecs module: Raised by codecs.decode, bytes.decode and bytearray.decode: * codecs.DecodeTypeError * codecs.DecodeValueError Raised by codecs.encode, str.encode: * codecs.EncodeTypeError * codecs.EncodeValueError Instances of UnicodeError wouldn't be wrapped, since they already contain codec information. -- components: Library (Lib) messages: 187704 nosy: ncoghlan priority: normal severity: normal status: open title: More informative error handling when encoding and decoding versions: Python 3.4 ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue17828 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue17828] More informative error handling when encoding and decoding
Changes by Ezio Melotti ezio.melo...@gmail.com: -- nosy: +ezio.melotti stage: - needs patch type: - enhancement ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue17828 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue17828] More informative error handling when encoding and decoding
Nick Coghlan added the comment: There may also be some specific improvement to be made to str.encode, bytes.decode and bytearray.decode in relation to the additional type checks they do to enforce the appropriate input and output types (see the bizarre expected bytes, not memoryview example) -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue17828 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue17828] More informative error handling when encoding and decoding
Changes by Florent Xicluna florent.xicl...@gmail.com: -- nosy: +flox ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue17828 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com