[issue17828] More informative error handling when encoding and decoding

2013-11-13 Thread Nick Coghlan

Nick Coghlan added the comment:

Patch for the final version that I'm about to commit.

- I realised the exception chaining would only trigger for the encode() and 
decode() methods, when it was equally applicable to the codecs.encode() and 
codecs.decode() functions, so I updated the test cases and moved it accordingly.

- reworded the What's New text to better clarify the historical confusion 
around the nature of the codecs module that these changes are intended to 
rectify (since the intent is clear from the existence of codecs.encode and 
codecs.decode and their coverage in the test suite since Python 2.4).

--
stage: needs patch - commit review
Added file: 
http://bugs.python.org/file32595/issue17828_improved_codec_errors_v7.diff

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue17828
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue17828] More informative error handling when encoding and decoding

2013-11-13 Thread Roundup Robot

Roundup Robot added the comment:

New changeset 854a2cea31b9 by Nick Coghlan in branch 'default':
Close #17828: better handling of codec errors
http://hg.python.org/cpython/rev/854a2cea31b9

--
nosy: +python-dev
resolution:  - fixed
stage: commit review - committed/rejected
status: open - closed

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue17828
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue17828] More informative error handling when encoding and decoding

2013-11-13 Thread Roundup Robot

Roundup Robot added the comment:

New changeset 99ba1772c469 by Christian Heimes in branch 'default':
Issue #17828: va_start() must be accompanied by va_end()
http://hg.python.org/cpython/rev/99ba1772c469

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue17828
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue17828] More informative error handling when encoding and decoding

2013-11-13 Thread Roundup Robot

Roundup Robot added the comment:

New changeset 26121ae22016 by Christian Heimes in branch 'default':
Issue #17828: _PyObject_GetDictPtr() may return NULL instead of a PyObject**
http://hg.python.org/cpython/rev/26121ae22016

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue17828
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue17828] More informative error handling when encoding and decoding

2013-11-13 Thread Christian Heimes

Christian Heimes added the comment:

Coverity has found two issues in your patch. I have fixed them both.

--
nosy: +christian.heimes

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue17828
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue17828] More informative error handling when encoding and decoding

2013-11-10 Thread Nick Coghlan

Nick Coghlan added the comment:

Updated patch (v5) with a more robust chaining mechanism provided as a private 
_PyErr_TrySetFromCause API. This version eliminates the previous whitelist in 
favour of checking directly for the ability to replace the exception with 
another instance of the same type without losing information.

This version also has more direct tests of the exception wrapping behaviour as 
a dedicated test class.

If I don't hear any objections in the next couple of days, I plan to commit 
this version.

--
Added file: 
http://bugs.python.org/file32561/issue17828_improved_codec_errors_v5.diff

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue17828
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue17828] More informative error handling when encoding and decoding

2013-11-10 Thread Marc-Andre Lemburg

Marc-Andre Lemburg added the comment:

On 10.11.2013 14:03, Nick Coghlan wrote:
 
 Updated patch (v5) with a more robust chaining mechanism provided as a 
 private _PyErr_TrySetFromCause API. This version eliminates the previous 
 whitelist in favour of checking directly for the ability to replace the 
 exception with another instance of the same type without losing information.
 
 This version also has more direct tests of the exception wrapping behaviour 
 as a dedicated test class.
 
 If I don't hear any objections in the next couple of days, I plan to commit 
 this version.

This doesn't look right:

diff -r 1ee45eb6aab9 Include/pyerrors.h
--- a/Include/pyerrors.hSat Nov 09 23:15:52 2013 +0200
+++ b/Include/pyerrors.hSun Nov 10 22:54:04 2013 +1000
...
+PyAPI_FUNC(PyObject *) _PyErr_TrySetFromCause(
+const char *prefix_format,   /* ASCII-encoded string  */
+...
+);

BTW: Why don't we make that API a public one ? It could be useful
in C extensions as well.

In the error messages, I'd use codecs.encode() and codecs.decode()
(ie. with parens) instead of codecs.encode and codecs.decode.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue17828
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue17828] More informative error handling when encoding and decoding

2013-11-10 Thread Nick Coghlan

Nick Coghlan added the comment:

On 10 November 2013 23:21, Marc-Andre Lemburg rep...@bugs.python.org wrote:

 Marc-Andre Lemburg added the comment:

 On 10.11.2013 14:03, Nick Coghlan wrote:

 Updated patch (v5) with a more robust chaining mechanism provided as a 
 private _PyErr_TrySetFromCause API. This version eliminates the previous 
 whitelist in favour of checking directly for the ability to replace the 
 exception with another instance of the same type without losing information.

 This version also has more direct tests of the exception wrapping behaviour 
 as a dedicated test class.

 If I don't hear any objections in the next couple of days, I plan to commit 
 this version.

 This doesn't look right:

 diff -r 1ee45eb6aab9 Include/pyerrors.h
 --- a/Include/pyerrors.hSat Nov 09 23:15:52 2013 +0200
 +++ b/Include/pyerrors.hSun Nov 10 22:54:04 2013 +1000
 ...
 +PyAPI_FUNC(PyObject *) _PyErr_TrySetFromCause(
 +const char *prefix_format,   /* ASCII-encoded string  */
 +...
 +);

The signature? That API doesn't currently let you change the exception
type, only the message (since the codecs machinery doesn't need to
change the exception type, and changing the exception type is fraught
with peril from a backwards compatibility point of view).

 BTW: Why don't we make that API a public one ? It could be useful
 in C extensions as well.

Because I'm not sure it's a good idea in general and hence am wary of
promoting it too much at this point in time (especially given the
severe limitations of what it can currently wrap). I'm convinced it's
worth it in this particular case (since being told the codec involved
directly makes the meaning of codec errors much clearer and even with
the limitations it can still wrap most errors from standard library
codecs), and the implementation *has* to be in exceptions.c since it
pokes around comparing the exception details to the internals of
BaseException to figure out if it can safely wrap the exception or
not.

Issue 18861 also makes me wonder if there's an underlying structural
problem in the way exception chaining currently works that could be
better solved by making it possible to annotate traceback frames while
unwinding the stack, which also makes me disinclined to add to the
public C API in this area before 3.5.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue17828
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue17828] More informative error handling when encoding and decoding

2013-11-10 Thread Nick Coghlan

Nick Coghlan added the comment:

On 10 November 2013 23:21, Marc-Andre Lemburg rep...@bugs.python.org wrote:

 This doesn't look right:

 diff -r 1ee45eb6aab9 Include/pyerrors.h
 --- a/Include/pyerrors.hSat Nov 09 23:15:52 2013 +0200
 +++ b/Include/pyerrors.hSun Nov 10 22:54:04 2013 +1000
 ...
 +PyAPI_FUNC(PyObject *) _PyErr_TrySetFromCause(
 +const char *prefix_format,   /* ASCII-encoded string  */
 +...
 +);

After sending my previous reply, I realised you may have been
referring to the comment. I copied that from the PyErr_Format
signature. According to
http://docs.python.org/dev/c-api/unicode.html#PyUnicode_FromFormat,
the format string still has to be ASCII-encoded, and if that's no
longer true, it's a separate bug from this one that will require a
docs fix as well.

 In the error messages, I'd use codecs.encode() and codecs.decode()
 (ie. with parens) instead of codecs.encode and codecs.decode.

Forgot to reply to this part - I like it, will switch it over before committing.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue17828
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue17828] More informative error handling when encoding and decoding

2013-11-10 Thread Nick Coghlan

Changes by Nick Coghlan ncogh...@gmail.com:


Added file: 
http://bugs.python.org/file32562/issue17828_improved_codec_errors_v6.diff

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue17828
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue17828] More informative error handling when encoding and decoding

2013-11-10 Thread Marc-Andre Lemburg

Marc-Andre Lemburg added the comment:

On 10.11.2013 15:39, Nick Coghlan wrote:
 
 On 10 November 2013 23:21, Marc-Andre Lemburg rep...@bugs.python.org wrote:

 This doesn't look right:

 diff -r 1ee45eb6aab9 Include/pyerrors.h
 --- a/Include/pyerrors.hSat Nov 09 23:15:52 2013 +0200
 +++ b/Include/pyerrors.hSun Nov 10 22:54:04 2013 +1000
 ...
 +PyAPI_FUNC(PyObject *) _PyErr_TrySetFromCause(
 +const char *prefix_format,   /* ASCII-encoded string  */
 +...
 +);

Sorry about the false warning. After looking at those lines
again, I realized that the ... is the argument ellipsis,
not some omitted code. At first this look like a function
definition to me :-)

 After sending my previous reply, I realised you may have been
 referring to the comment. I copied that from the PyErr_Format
 signature. According to
 http://docs.python.org/dev/c-api/unicode.html#PyUnicode_FromFormat,
 the format string still has to be ASCII-encoded, and if that's no
 longer true, it's a separate bug from this one that will require a
 docs fix as well.

Also note that it's not clear whether the ASCII
refers to the format string or the resulting formatted string.
For the format string, ASCII would probably be fine, but
for the formatted string, UTF-8 should be allowed, since it's
not uncommon to add e.g. parameter strings that caused the
error to the error string.

That's a separate ticket, though.

 In the error messages, I'd use codecs.encode() and codecs.decode()
 (ie. with parens) instead of codecs.encode and codecs.decode.
 
 Forgot to reply to this part - I like it, will switch it over before 
 committing.

Thanks.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue17828
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue17828] More informative error handling when encoding and decoding

2013-11-05 Thread Nick Coghlan

Nick Coghlan added the comment:

New and improved implementation attached that extracts the exception chaining 
to a helper functions and calls it only when it is the call in to the codecs 
machinery that failed (eliminating the need for the output flag, and covering 
decoding as well as encoding).

TypeError, AttributeError and ValueError are all wrapped with chained 
exceptions that mention the codec that failed.

(Annoyingly, bz2_codec throws OSError instead of ValueError for bad input data, 
but wrapping OSError safely is a pain due to the extra state potentially 
carried on instances. So letting it escape unwrapped is the simpler and more 
conservative option at this point)

 import codecs
 codecs.encode(bhello, bz2_codec).decode(bz2_codec)
Traceback (most recent call last):
  File stdin, line 1, in module
TypeError: 'bz2_codec' decoder returned 'bytes' instead of 'str'; use 
codecs.decode to decode to arbitrary types

 bhello.decode(rot_13)
AttributeError: 'memoryview' object has no attribute 'translate'

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File stdin, line 1, in module
AttributeError: decoding with 'rot_13' codec failed (AttributeError: 
'memoryview' object has no attribute 'translate')

 hello.encode(bz2_codec)
TypeError: 'str' does not support the buffer interface

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File stdin, line 1, in module
TypeError: encoding with 'bz2_codec' codec failed (TypeError: 'str' does not 
support the buffer interface)

 hello.encode(rot_13)
Traceback (most recent call last):
  File stdin, line 1, in module
TypeError: 'rot_13' encoder returned 'str' instead of 'bytes'; use 
codecs.encode to encode to arbitrary types

--
Added file: 
http://bugs.python.org/file32508/issue17828_improved_codec_errors_v3.diff

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue17828
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue17828] More informative error handling when encoding and decoding

2013-11-05 Thread Nick Coghlan

Nick Coghlan added the comment:

Checking the other binary-binary and str-str codecs with input type and 
value restrictions:

- they all throw TypeError and get wrapped appropriately when asked to encode 
str input (rot_13 throws the output type error)

- rot_13 throws an appropriately wrapped AttributeError when asked to decode 
bytes or bytearray object

For bad value input, uu_codec is the only one that throws a normal 
ValueError, I couldn't figure out a way to get quopri_codec to complain about 
the input value and the others throw a module specific error:

binascii (base64_codec, hex_codec) throws binascii.Error (a custom 
ValueError subclass)
zlib (zlib_codec) throws zlib.error (inherits directly from Exception)

As with the OSError that escapes from bz2_codec, I think the simplest and most 
conservative option is to not worry about those at this point.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue17828
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue17828] More informative error handling when encoding and decoding

2013-11-05 Thread Nick Coghlan

Nick Coghlan added the comment:

Updated patch adds systematic tests for the new error handling to 
test_codecs.TransformTests

I also moved the codecs changes up to a Codec handling improvements section.

My rationale for doing that is that this is actually a pretty significant 
usability enhancement and Python 3 codec model clarification for heavy users of 
binary codecs coming from Python 2, and because I also plan to follow up on 
this issue by bringing back the shorthand aliases for these codecs that were 
removed in issue 10807 (thus closing issue 7475).

If issue 15216 gets finished (changing stream encodings after creation) that 
would also be a substantial enhancement worth mentioning here.

--
Added file: 
http://bugs.python.org/file32509/issue17828_improved_codec_errors_v4.diff

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue17828
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue17828] More informative error handling when encoding and decoding

2013-11-04 Thread Nick Coghlan

Nick Coghlan added the comment:

Updated patch. The results of this suggests to me that the input wrappers are 
likely infeasible at this point in time, but improving the errors for the wrong 
*output* type is entirely feasible. Since the main conversion we need to prompt 
is things like binary_object.decode(binary_codec) - 
codecs.decode(binary_object, binary_codec), I suggest we limit the scope of 
this issue to that part of the problem.

 import codecs
 codecs.encode(bhello, bz2_codec).decode(bz2_codec)
Traceback (most recent call last):
  File stdin, line 1, in module
TypeError: 'bz2_codec' decoder returned 'bytes' instead of 'str'; use 
codecs.decode to decode to arbitrary types
 hello.encode(bz2_codec)
TypeError: 'str' does not support the buffer interface

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File stdin, line 1, in module
TypeError: invalid input type for 'bz2_codec' codec (TypeError: 'str' does not 
support the buffer interface)
 hello.encode(rot_13)
TypeError: 'rot_13' encoder returned 'str' instead of 'bytes'; use 
codecs.encode to encode to arbitrary types

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File stdin, line 1, in module
TypeError: invalid input type for 'rot_13' codec (TypeError: 'rot_13' encoder 
returned 'str' instead of 'bytes'; use codecs.encode to encode to arbitrary 
types)

--
Added file: 
http://bugs.python.org/file32496/issue17828_improved_codec_errors.diff

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue17828
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue17828] More informative error handling when encoding and decoding

2013-11-04 Thread Nick Coghlan

Nick Coghlan added the comment:

Ah, came up with a relatively simple solution based on an internal helper 
function with an optional output flag:

 import codecs
 codecs.encode(bhello, bz2_codec).decode(bz2_codec)
Traceback (most recent call last):
  File stdin, line 1, in module
TypeError: 'bz2_codec' decoder returned 'bytes' instead of 'str'; use 
codecs.decode to decode to arbitrary types

 hello.encode(bz2_codec)
TypeError: 'str' does not support the buffer interface

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File stdin, line 1, in module
TypeError: invalid input type for 'bz2_codec' codec (TypeError: 'str' does not 
support the buffer interface)

 hello.encode(rot_13)
Traceback (most recent call last):
  File stdin, line 1, in module
TypeError: 'rot_13' encoder returned 'str' instead of 'bytes'; use 
codecs.encode to encode to arbitrary types

--
Added file: 
http://bugs.python.org/file32497/issue17828_improved_codec_errors_v2.diff

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue17828
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue17828] More informative error handling when encoding and decoding

2013-11-04 Thread Nick Coghlan

Nick Coghlan added the comment:

The other thing is that this patch doesn't wrap AttributeError. I'm OK with 
that, since I believe the only codec in the standard library that currently 
throws that for a bad input type is rot_13.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue17828
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue17828] More informative error handling when encoding and decoding

2013-11-04 Thread STINNER Victor

STINNER Victor added the comment:

It would be simpler to just drop these custom codecs (rot13, bz2, hex, etc.) 
instead of helping to use them :-)

--
nosy: +haypo

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue17828
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue17828] More informative error handling when encoding and decoding

2013-11-04 Thread Marc-Andre Lemburg

Marc-Andre Lemburg added the comment:

On 04.11.2013 14:30, STINNER Victor wrote:
 
 It would be simpler to just drop these custom codecs (rot13, bz2, hex, etc.) 
 instead of helping to use them :-)

-1 for the same reasons I keep repeating over and over and over again :-)

The codec system was designed to work obj-obj. Python 3 limits the types
for the bytes/str helper methods, but that limitation does not extend
to the codec design.

+1 on having better error messages. In the long run, we should add
supported input/output type information to codecs, so that error
reporting and codec introspection becomes easier.

-- 
Marc-Andre Lemburg
eGenix.com

Professional Python Services directly from the Source  (#1, Nov 04 2013)
 Python Projects, Consulting and Support ...   http://www.egenix.com/
 mxODBC.Zope/Plone.Database.Adapter ...   http://zope.egenix.com/
 mxODBC, mxDateTime, mxTextTools ...http://python.egenix.com/

2013-11-19: Python Meeting Duesseldorf ... 15 days to go

: Try our mxODBC.Connect Python Database Interface for free ! ::

   eGenix.com Software, Skills and Services GmbH  Pastor-Loeh-Str.48
D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg
   Registered at Amtsgericht Duesseldorf: HRB 46611
   http://www.egenix.com/company/contact/

--
nosy: +lemburg

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue17828
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue17828] More informative error handling when encoding and decoding

2013-11-04 Thread Nick Coghlan

Nick Coghlan added the comment:

I think I figured out a better way to structure this that avoids the need for 
the output flag and is more easily expanded to whitelist additional exception 
types as safe to wrap.

I'll try to come up with a new patch tonight.

--
assignee:  - ncoghlan

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue17828
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue17828] More informative error handling when encoding and decoding

2013-05-09 Thread Nick Coghlan

Nick Coghlan added the comment:

Ezio pointed out on IRC that the extra type checks in str.encode, bytes.decode 
and bytearray.decode should reference the appopriate codecs module function in 
addition to the codec in use.

So if str.encode produces something other than bytes, it should reference 
codecs.encode, while the binary decoding methods should mention codecs.decode 
if they produce something other than str.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue17828
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue17828] More informative error handling when encoding and decoding

2013-05-09 Thread Ezio Melotti

Ezio Melotti added the comment:

The attached patch changes the error message of str.encode/bytes.decode when 
the codec returns the wrong type:

 import codecs
 'example'.encode('rot_13')
TypeError: encoder returned 'str' instead of 'bytes', use codecs.decode for 
str-str conversions
 codecs.encode('example', 'rot_13')
'rknzcyr'

 b'000102'.decode('hex_codec')
TypeError: decoder returned 'bytes' instead of 'str', use codecs.encode for 
bytes-bytes conversions
 codecs.decode(b'000102', 'hex_codec')
b'\x00\x01\x02'

This only solves part of the problem though, because individual codecs might 
raise other errors if the input type is wrong:
 'example'.encode('hex_codec')
Traceback (most recent call last):
  File /home/wolf/dev/py/py3k/Lib/encodings/hex_codec.py, line 16, in 
hex_encode
return (binascii.b2a_hex(input), len(input))
TypeError: 'str' does not support the buffer interface

--
keywords: +patch
Added file: http://bugs.python.org/file30189/issue17828-1.diff

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue17828
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue17828] More informative error handling when encoding and decoding

2013-05-09 Thread Ezio Melotti

Ezio Melotti added the comment:

To summarize:
 * str.encode does only str-bytes;
 * bytes.decode does only bytes- str;
 * codecs.encode/decode do obj-obj;

The things that could go wrong are:
 1) the input type is wrong (i.e. the codec doesn't accept the type of the 
input);
 2) the input value is invalid;
 3) for str.encode/bytes.decode only, the output type is wrong (i.e. the codec 
returned a non-bytes/non-str object);

My patch only covers 3.  The four new exceptions suggested by Nick in msg187704 
would cover the first 2 cases.
For str.encode/bytes.decode, if we knew the input type accepted by the codec we 
could also provide a better error message (e.g. codecs accepts '...', not 
'...'; use ... instead), but we don't.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue17828
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue17828] More informative error handling when encoding and decoding

2013-05-09 Thread Ezio Melotti

Ezio Melotti added the comment:

The attached proof of concept catches Type/ValueError in str.encode and raises 
another exception with a better message:
 'example'.encode('hex_codec')
Traceback (most recent call last):
  File stdin, line 1, in module
TypeError: invalid input type for hex_codec codec ('str' does not support the 
buffer interface)

(note: the patch doesn't handle the exception chaining yet and probably leaks.)

If Nick proposal in msg187704 is accepted, this should become a 
codecs.EncodeTypeError.  The same should then be done for bytes.decode and for 
codecs.encode/decode.

--
Added file: http://bugs.python.org/file30190/issue17828-2.diff

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue17828
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue17828] More informative error handling when encoding and decoding

2013-04-25 Thread Nick Coghlan

Nick Coghlan added the comment:

I tracked down the proximate cause of the weird exception in the bytes.decode 
case: the base64 module only accepts bytes and bytearray objects, instead of 
using memoryview to accept anything that supports the buffer API and provides a 
C-contiguous 8-bit view of the underlying data. Raised as issue 17839.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue17828
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue17828] More informative error handling when encoding and decoding

2013-04-25 Thread Nick Coghlan

Nick Coghlan added the comment:

Here's an example of the specific type errors raised by additional checks in 
the text-encoding specific methods. I believe the main improvement needed here 
is to mention the encoding name in the exception message:

example.encode(rot_13)
Traceback (most recent call last):
  File stdin, line 1, in module
TypeError: encoder did not return a bytes object (type=str)


b'BZh91AYSY\xc1uvK\x00\x00\x01F\x80\x00\x10\x00\x04\x00\x00\x10 
\x000\xcd\x00\xc1\xa0P\xe2\xeeH\xa7\n\x12\x18.\xae\xc9`'.decode(bz2_codec)
Traceback (most recent call last):
  File stdin, line 1, in module
TypeError: decoder did not return a str object (type=bytes)

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue17828
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue17828] More informative error handling when encoding and decoding

2013-04-25 Thread Barry A. Warsaw

Changes by Barry A. Warsaw ba...@python.org:


--
nosy: +barry

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue17828
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue17828] More informative error handling when encoding and decoding

2013-04-24 Thread Nick Coghlan

New submission from Nick Coghlan:

Passing the wrong types to codecs can currently lead to rather confusing 
exceptions, like:


 bZXhhbXBsZQ==\n.decode(base64_codec)
Traceback (most recent call last):
  File stdin, line 1, in module
  File /usr/lib64/python3.2/encodings/base64_codec.py, line 20, in 
base64_decode
return (base64.decodebytes(input), len(input))
  File /usr/lib64/python3.2/base64.py, line 359, in decodebytes
raise TypeError(expected bytes, not %s % s.__class__.__name__)
TypeError: expected bytes, not memoryview

 codecs.decode(example, utf8)
Traceback (most recent call last):
  File stdin, line 1, in module
  File /usr/lib64/python3.2/encodings/utf_8.py, line 16, in decode
return codecs.utf_8_decode(input, errors, True)
TypeError: 'str' does not support the buffer interface


This situation could be improved by having the affected APIs use the exception 
chaining system to wrap these errors in a more informative exception that also 
display information on the codec involved. Note that UnicodeEncodeError and 
UnicodeDecodeError are not appropriate, as those are specific to text encoding 
operations, while these new wrappers will apply to arbitrary codecs, regardless 
of whether or not they use the unicode error handlers. Furthermore, for 
backwards compatibility with existing exception handling, it is probably 
necessary to limit ourselves to specific exception types and ensure that the 
wrapper exceptions are subclasses of those types.

These new wrappers would have __cause__ set to the exception raised by the 
codec, but emit a message more along the lines of the following:

==
codecs.DecodeTypeError: encoding='utf8', details=TypeError: 'str' does not 
support the buffer interface
==

Wrapping TypeError and ValueError should cover most cases, which would mean 
four new exception types in the codecs module:

Raised by codecs.decode, bytes.decode and bytearray.decode:
* codecs.DecodeTypeError
* codecs.DecodeValueError

Raised by codecs.encode, str.encode:
* codecs.EncodeTypeError
* codecs.EncodeValueError

Instances of UnicodeError wouldn't be wrapped, since they already contain codec 
information.

--
components: Library (Lib)
messages: 187704
nosy: ncoghlan
priority: normal
severity: normal
status: open
title: More informative error handling when encoding and decoding
versions: Python 3.4

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue17828
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue17828] More informative error handling when encoding and decoding

2013-04-24 Thread Ezio Melotti

Changes by Ezio Melotti ezio.melo...@gmail.com:


--
nosy: +ezio.melotti
stage:  - needs patch
type:  - enhancement

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue17828
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue17828] More informative error handling when encoding and decoding

2013-04-24 Thread Nick Coghlan

Nick Coghlan added the comment:

There may also be some specific improvement to be made to str.encode, 
bytes.decode and bytearray.decode in relation to the additional type checks 
they do to enforce the appropriate input and output types (see the bizarre 
expected bytes, not memoryview example)

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue17828
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue17828] More informative error handling when encoding and decoding

2013-04-24 Thread Florent Xicluna

Changes by Florent Xicluna florent.xicl...@gmail.com:


--
nosy: +flox

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue17828
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com