[issue25880] u'..'.encode('idna') → UnicodeError: label empty or too long

2015-12-18 Thread Serhiy Storchaka

Serhiy Storchaka added the comment:

The bare UnicodeError is raised also by following codecs: utf_16, utf_32, 
punycode, undefined, and East-Asian multibyte codecs, and by undocumented an 
unused function urllib.urlparse.to_bytes().

I think it would be nice to be more specific if possible.

--
nosy: +serhiy.storchaka

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue25880] u'..'.encode('idna') → UnicodeError: label empty or too long

2015-12-18 Thread R. David Murray

R. David Murray added the comment:

I wonder if we originally only had UnicodeError and it got split later but 
these codecs were never updated.  The codecs date back to the start of unicode 
support in python2, I think.

Adding MAL, he's likely to have an opinion on this ;)

Oh, right.  The more likely possibility is that there was (in python2) no way 
to know if the operation was (from the user's POV) encoding or decoding when 
the codec was called.  In python3 we do know, when the codec is called via 
encode/decode, but the codecs are still generic in principle.  So yeah, we need 
MAL's opinion.  (Or, I could be completely confused, since I always found 
encode/decode confusing in python2 :)

--
nosy: +lemburg

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue25880] u'..'.encode('idna') → UnicodeError: label empty or too long

2015-12-18 Thread Marc-Andre Lemburg

Marc-Andre Lemburg added the comment:

On 18.12.2015 20:25, R. David Murray wrote:
> I wonder if we originally only had UnicodeError and it got split later but 
> these codecs were never updated.  The codecs date back to the start of 
> unicode support in python2, I think.

UnicodeDecodeError and UnicodeEncodeError were added in Python 2.3
as part of the more flexible error handlers.

> Adding MAL, he's likely to have an opinion on this ;)
> 
> Oh, right.  The more likely possibility is that there was (in python2) no way 
> to know if the operation was (from the user's POV) encoding or decoding when 
> the codec was called.  In python3 we do know, when the codec is called via 
> encode/decode, but the codecs are still generic in principle.  So yeah, we 
> need MAL's opinion.  (Or, I could be completely confused, since I always 
> found encode/decode confusing in python2 :)

There's a clear direction with codecs:
- encode: transform to the encoded data
- decode: transform back from the encoded data

Take e.g. the hex codec. It encodes data into hex format and
decodes from hex format back into the original data.

The IDNA codecs transforms Unicode domains into the IDNA format
(.encode()) and back to Unicode again (.decode()).

It was added in Python 2.3 as well, so I guess it was just
an overlap/oversight that it was not adapted to the new error
classes.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue25880] u'..'.encode('idna') → UnicodeError: label empty or too long

2015-12-17 Thread SpaceOne

SpaceOne added the comment:

It makes error handling really hard.
Here is a patch:
https://github.com/python/cpython/compare/master...spaceone:idna?expand=1

--
status: closed -> open

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue25880] u'..'.encode('idna') → UnicodeError: label empty or too long

2015-12-17 Thread R. David Murray

R. David Murray added the comment:

Can you explain why it makes error handling hard?  I'm still not seeing the use 
case.  I've always viewed UnicodeEncodeError vs UnicodeDecodeError as "extra" 
information for the consumer of the error message, not something that matters 
in code (I just catch UnicodeError).

I'm not objecting to the change, but it might be nice to know why Martin chose 
plain UnicodeError, if he's got the time to answer.

--
nosy: +loewis

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue25880] u'..'.encode('idna') → UnicodeError: label empty or too long

2015-12-17 Thread SpaceOne

SpaceOne added the comment:

Because i need to do everywhere where I use this:

try:
user_input.encode(encoding)
except UnicodeDecodeError:
raise
except (UnicodeError, UnicodeEncodeError):
do_my_error_handling()

instead of
try:
user_input.encode(encoding)
except UnicodeEncodeError:
do_my_error_handling()

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue25880] u'..'.encode('idna') → UnicodeError: label empty or too long

2015-12-17 Thread SilentGhost

SilentGhost added the comment:

I think what David was trying to say is that you could do

try:
user_input.encode(encoding)
except UnicodeError:
do_my_error_handling()

since UnicodeError is a super class of UnicodeDecodeError and 
UnicodeEncodeError.

--
nosy: +SilentGhost

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue25880] u'..'.encode('idna') → UnicodeError: label empty or too long

2015-12-17 Thread SpaceOne

SpaceOne added the comment:

I know that UnicodeEncodeError is a subclass of UnicodeError. The problem here 
is that UnicodeError would also catch UnicodeDecodeError.
This is especially disturbing if you catch errors of a whole function.

If you e.g. use python2.7 you might want to catch only UnicodeEncodeError if 
you encode something and don't want to catch UnicodeDecodeError.

>>> b'\xff'.encode('utf-8')
Traceback (most recent call last):
  File "", line 1, in 
UnicodeDecodeError: 'ascii' codec can't decode byte 0xff in position 0: ordinal 
not in range(128)

(Read that code carefully!!! It's not something which should ever be done but 
might happen in the world)
Especially if you are writing python2+3 compatible applications.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue25880] u'..'.encode('idna') → UnicodeError: label empty or too long

2015-12-16 Thread SpaceOne

SpaceOne added the comment:

But why is the error UnicodeError instead of UnicodeEncodeError?

--
resolution: not a bug -> 
status: closed -> open

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue25880] u'..'.encode('idna') → UnicodeError: label empty or too long

2015-12-16 Thread R. David Murray

R. David Murray added the comment:

Why does it matter?  If you want to suggest changing it, you could propose a 
patch.  Maybe in reading the code you'll find out why it is the way it is now.  
I haven't looked at that code in a while myself, so I don't remember if there 
is a reason or not :)

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue25880] u'..'.encode('idna') → UnicodeError: label empty or too long

2015-12-16 Thread R. David Murray

R. David Murray added the comment:

The error message is accurate.  That string has empty label segments in it, 
which RFC 5890 defines as an error on encoding.  There is no such error defined 
for decoding, so that doesn't raise an error.

I don't see anything wrong with the error message, it includes the same one as 
raised in python2.  Perhaps you are confused by the error chaining introduced 
in Python3?  The second part of the traceback is coming from the encoding 
machinery, while the first part lets you know where in the encoder the error 
was raised.  In this case having both doesn't provide much additional 
information, but if one was debugging a codec or the error were coming from 
inside an application, it would.

--
nosy: +r.david.murray
resolution:  -> not a bug
stage:  -> resolved
status: open -> closed

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue25880] u'..'.encode('idna') → UnicodeError: label empty or too long

2015-12-16 Thread R. David Murray

Changes by R. David Murray :


--
status: open -> closed

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue25880] u'..'.encode('idna') → UnicodeError: label empty or too long

2015-12-16 Thread SpaceOne

New submission from SpaceOne:

Python 3.4.2 (default, Oct  8 2014, 10:45:20)
>>> u'..'.encode('idna')
Traceback (most recent call last):
  File "/usr/lib/python3.4/encodings/idna.py", line 165, in encode
raise UnicodeError("label empty or too long")
UnicodeError: label empty or too long

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "", line 1, in 
UnicodeError: encoding with 'idna' codec failed (UnicodeError: label empty or 
too long)

→ I was expecting that this raises either not at all or UnicodeEncodeError.

>>> b'..'.decode('idna')
'..'
→ Why doesn't this raise then, too?

The error message is also messed up which wasn't the case in python 2.7. It 
could be cleaned up.

--
components: Unicode
messages: 256514
nosy: ezio.melotti, haypo, spaceone
priority: normal
severity: normal
status: open
title: u'..'.encode('idna') → UnicodeError: label empty or too long
versions: Python 2.7, Python 3.4

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com