Re: [Python-Dev] unicode hell/mixing str and unicode as dictionary keys

2006-08-08 Thread M.-A. Lemburg
Armin Rigo wrote: Hi, On Thu, Aug 03, 2006 at 07:53:11PM +0200, M.-A. Lemburg wrote: I though I'd heard (from Guido here or on the py3k list) that it was only 1 u'abc' that would raise an exception, and that 1 == u'abc' would still evaluate to False. Did I misunderstand? Could be that

Re: [Python-Dev] unicode hell/mixing str and unicode as dictionary keys

2006-08-08 Thread David Hopwood
Martin v. Löwis wrote: David Hopwood schrieb: Michael Foord wrote: David Hopwood wrote:[snip..] we should, of course, continue to use the one we always used (for ascii, there is no difference between the two). +1 This seems the most (only ?) logical solution. No; always considering Unicode

Re: [Python-Dev] unicode hell/mixing str and unicode as dictionary keys

2006-08-07 Thread Martin v. Löwis
M.-A. Lemburg schrieb: There's no disputing that an exception should be raised if the string *must* be interpretable as characters in order to continue. But that's not true here if you allow for the interpretation that they're simply objects of different (duck) type and therefore unequal.

Re: [Python-Dev] unicode hell/mixing str and unicode as dictionary keys

2006-08-07 Thread Michael Foord
Martin v. Löwis wrote: [snip..] Expanding this view to Unicode should mean that a unicode string U equals a byte string B if U.encode(system_encode) == B or B.decode(system_encoding) == U, and that they don't equal otherwise (e.g. if the conversion fails with a not convertible exception).

Re: [Python-Dev] unicode hell/mixing str and unicode as dictionary keys

2006-08-07 Thread David Hopwood
Michael Foord wrote: Martin v. Löwis wrote: [snip..] Expanding this view to Unicode should mean that a unicode string U equals a byte string B if U.encode(system_encode) == B or B.decode(system_encoding) == U, and that they don't equal otherwise (e.g. if the conversion fails with a not

Re: [Python-Dev] unicode hell/mixing str and unicode as dictionary keys

2006-08-07 Thread Martin v. Löwis
David Hopwood schrieb: I disagree. Unicode strings should always be considered distinct from non-ASCII byte strings. Implicitly encoding or decoding in order to perform a comparison is a bad idea; it is expensive and will often do the wrong thing. That's a pretty irrelevant position at this

Re: [Python-Dev] unicode hell/mixing str and unicode as dictionary keys

2006-08-07 Thread Armin Rigo
Hi, On Thu, Aug 03, 2006 at 07:53:11PM +0200, M.-A. Lemburg wrote: I though I'd heard (from Guido here or on the py3k list) that it was only 1 u'abc' that would raise an exception, and that 1 == u'abc' would still evaluate to False. Did I misunderstand? Could be that I'm wrong. I

Re: [Python-Dev] unicode hell/mixing str and unicode as dictionary keys

2006-08-07 Thread Michael Foord
David Hopwood wrote:[snip..] we should, of course, continue to use the one we always used (for ascii, there is no difference between the two). +1 This seems the most (only ?) logical solution. No; always considering Unicode and non-ASCII byte strings to be distinct is

Re: [Python-Dev] unicode hell/mixing str and unicode as dictionary keys

2006-08-07 Thread David Hopwood
Michael Foord wrote: David Hopwood wrote:[snip..] we should, of course, continue to use the one we always used (for ascii, there is no difference between the two). +1 This seems the most (only ?) logical solution. No; always considering Unicode and non-ASCII byte strings to be distinct

Re: [Python-Dev] unicode hell/mixing str and unicode as dictionary keys

2006-08-07 Thread Martin v. Löwis
David Hopwood schrieb: Michael Foord wrote: David Hopwood wrote:[snip..] we should, of course, continue to use the one we always used (for ascii, there is no difference between the two). +1 This seems the most (only ?) logical solution. No; always considering Unicode and non-ASCII byte

Re: [Python-Dev] unicode hell/mixing str and unicode as dictionary keys

2006-08-07 Thread Martin v. Löwis
Armin Rigo schrieb: I also seem to remember that TypeErrors should only signal ordering non-sense, not equality. In this case, I'm on the opinion that unicode objects and completely-unrelated strings of random bytes should successfully compare as unequal, but I'm not enough of a unicode user

Re: [Python-Dev] unicode hell/mixing str and unicode as dictionary keys

2006-08-04 Thread Ralf Schmitt
Jean-Paul Calderone wrote: I like the exception that 2.5 raises. I only wish it raised by default when using 'ascii' and u'ascii' as keys in the same dictionary. ;) Oh, and that str and unicode did not hash like they do. ;) No problem: import sys reload(sys) module 'sys' (built-in)

Re: [Python-Dev] unicode hell/mixing str and unicode as dictionary keys

2006-08-04 Thread M.-A. Lemburg
Ralf Schmitt wrote: Does python 2.4 catch any exception when comparing keys (which are not basestrings) in dictionaries? Yes. It does so for all equality compares that need to be done as part of the hash collision algorithm (not only w/r to strings and Unicode, but in general). This was

Re: [Python-Dev] unicode hell/mixing str and unicode as dictionary keys

2006-08-04 Thread Ralf Schmitt
M.-A. Lemburg wrote: Ralf Schmitt wrote: Does python 2.4 catch any exception when comparing keys (which are not basestrings) in dictionaries? Yes. It does so for all equality compares that need to be done as part of the hash collision algorithm (not only w/r to strings and Unicode, but in

Re: [Python-Dev] unicode hell/mixing str and unicode as dictionary keys

2006-08-04 Thread Michael Hudson
M.-A. Lemburg [EMAIL PROTECTED] writes: The point here is that a typical user won't expect any comparisons to be made when dealing with dictionaries, simply because the fact that you do need to make comparisons is an implementation detail. Of course looking things up in a dictionary involves

Re: [Python-Dev] unicode hell/mixing str and unicode as dictionary keys

2006-08-04 Thread M.-A. Lemburg
Greg Ewing wrote: M.-A. Lemburg wrote: If a string is not ASCII and thus causes the exception, there's not a lot you can say, since you don't know the encoding of the string. That's one way of looking at it. Another is that any string containing chars 127 is not text at all, but

Re: [Python-Dev] unicode hell/mixing str and unicode as dictionary keys

2006-08-04 Thread M.-A. Lemburg
Ralf Schmitt wrote: M.-A. Lemburg wrote: Ralf Schmitt wrote: Does python 2.4 catch any exception when comparing keys (which are not basestrings) in dictionaries? Yes. It does so for all equality compares that need to be done as part of the hash collision algorithm (not only w/r to strings

Re: [Python-Dev] unicode hell/mixing str and unicode as dictionary keys

2006-08-04 Thread Bob Ippolito
On Aug 3, 2006, at 9:34 PM, Josiah Carlson wrote: Bob Ippolito [EMAIL PROTECTED] wrote: On Aug 3, 2006, at 6:51 PM, Greg Ewing wrote: M.-A. Lemburg wrote: Perhaps we ought to add an exception to the dict lookup mechanism and continue to silence UnicodeErrors ?! Seems to be that

Re: [Python-Dev] unicode hell/mixing str and unicode as dictionary keys

2006-08-04 Thread Christopher Armstrong
On 8/4/06, Ralf Schmitt [EMAIL PROTECTED] wrote: Jean-Paul Calderone wrote: I like the exception that 2.5 raises.I only wish it raised by default when using 'ascii' and u'ascii' as keys in the same dictionary. ;)Oh, and that str and unicode did not hash like they do.;) No problem: import sys

Re: [Python-Dev] unicode hell/mixing str and unicode as dictionary keys

2006-08-04 Thread Ralf Schmitt
Christopher Armstrong wrote: On 8/4/06, *Ralf Schmitt* [EMAIL PROTECTED] mailto:[EMAIL PROTECTED] wrote: Maybe this is all just a matter of choosing the right defaultencoding ? :) Doing this is amazingly stupid. I can't believe how often I hear this suggestion. Apparently

[Python-Dev] unicode hell/mixing str and unicode as dictionary keys

2006-08-03 Thread Ralf Schmitt
Still trying to port our software. here's another thing I noticed: d = {} d[u'm\xe1s'] = 1 d['m\xe1s'] = 1 print d With python 2.4 I can add those two keys to the dictionary and get: $ python2.4 t2.py {u'm\xe1s': 1, 'm\xe1s': 1} With python 2.5 I get: $ python2.5 t2.py Traceback (most recent

Re: [Python-Dev] unicode hell/mixing str and unicode as dictionary keys

2006-08-03 Thread Ralf Schmitt
Ralf Schmitt wrote: Still trying to port our software. here's another thing I noticed: d = {} d[u'm\xe1s'] = 1 d['m\xe1s'] = 1 print d With python 2.4 I can add those two keys to the dictionary and get: $ python2.4 t2.py {u'm\xe1s': 1, 'm\xe1s': 1} With python 2.5 I get: $

Re: [Python-Dev] unicode hell/mixing str and unicode as dictionary keys

2006-08-03 Thread M.-A. Lemburg
Ralf Schmitt wrote: Ralf Schmitt wrote: Still trying to port our software. here's another thing I noticed: d = {} d[u'm\xe1s'] = 1 d['m\xe1s'] = 1 print d With python 2.4 I can add those two keys to the dictionary and get: $ python2.4 t2.py {u'm\xe1s': 1, 'm\xe1s': 1} With python 2.5

Re: [Python-Dev] unicode hell/mixing str and unicode as dictionary keys

2006-08-03 Thread Bob Ippolito
On Aug 3, 2006, at 9:51 AM, M.-A. Lemburg wrote: Ralf Schmitt wrote: Ralf Schmitt wrote: Still trying to port our software. here's another thing I noticed: d = {} d[u'm\xe1s'] = 1 d['m\xe1s'] = 1 print d With python 2.4 I can add those two keys to the dictionary and get: $ python2.4

Re: [Python-Dev] unicode hell/mixing str and unicode as dictionary keys

2006-08-03 Thread Ralf Schmitt
M.-A. Lemburg wrote: Ralf Schmitt wrote: Ralf Schmitt wrote: Still trying to port our software. here's another thing I noticed: d = {} d[u'm\xe1s'] = 1 d['m\xe1s'] = 1 print d With python 2.4 I can add those two keys to the dictionary and get: $ python2.4 t2.py {u'm\xe1s': 1, 'm\xe1s':

Re: [Python-Dev] unicode hell/mixing str and unicode as dictionary keys

2006-08-03 Thread M.-A. Lemburg
Ralf Schmitt wrote: Still trying to port our software. here's another thing I noticed: d = {} d[u'm\xe1s'] = 1 d['m\xe1s'] = 1 print d With python 2.5 I get: $ python2.5 t2.py Traceback (most recent call last): File t2.py, line 3, in module d['m\xe1s'] = 1 UnicodeDecodeError:

Re: [Python-Dev] unicode hell/mixing str and unicode as dictionary keys

2006-08-03 Thread John J Lee
On Thu, 3 Aug 2006, M.-A. Lemburg wrote: [...] It's actually a good preparation for Py3k where 1 == u'abc' will (likely) also raise an exception. I though I'd heard (from Guido here or on the py3k list) that it was only 1 u'abc' that would raise an exception, and that 1 == u'abc' would still

Re: [Python-Dev] unicode hell/mixing str and unicode as dictionary keys

2006-08-03 Thread M.-A. Lemburg
John J Lee wrote: On Thu, 3 Aug 2006, M.-A. Lemburg wrote: [...] It's actually a good preparation for Py3k where 1 == u'abc' will (likely) also raise an exception. I though I'd heard (from Guido here or on the py3k list) that it was only 1 u'abc' that would raise an exception, and that 1

Re: [Python-Dev] unicode hell/mixing str and unicode as dictionary keys

2006-08-03 Thread M.-A. Lemburg
Jim Jewett wrote: http://mail.python.org/pipermail/python-dev/2006-August/067934.html M.-A. Lemburg mal at egenix.com Ralf Schmitt wrote: Still trying to port our software. here's another thing I noticed: d = {} d[u'm\xe1s'] = 1 d['m\xe1s'] = 1 print d (a 2-element dictionary,

Re: [Python-Dev] unicode hell/mixing str and unicode as dictionary keys

2006-08-03 Thread Delaney, Timothy (Tim)
M.-A. Lemburg wrote: Perhaps we ought to add an exception to the dict lookup mechanism and continue to silence UnicodeErrors ?! I'd definitely consider a UnicodeError to be an indication that two objects are not equal. At the very least, in the context of a dictionary lookup. Tim Delaney

Re: [Python-Dev] unicode hell/mixing str and unicode as dictionary keys

2006-08-03 Thread Michael Urman
On 8/3/06, M.-A. Lemburg [EMAIL PROTECTED] wrote: ...but in the case of dictionaries this behaviour has changed and in prior versions of python dictionaries did work as I expected them to. Now they don't. Let's put it this way: Python 2.5 uncovered a bug in your application that has

Re: [Python-Dev] unicode hell/mixing str and unicode as dictionary keys

2006-08-03 Thread Greg Ewing
M.-A. Lemburg wrote: Perhaps we ought to add an exception to the dict lookup mechanism and continue to silence UnicodeErrors ?! Seems to be that comparison of unicode and non-unicode strings for equality shouldn't raise exceptions in the first place. -- Greg

Re: [Python-Dev] unicode hell/mixing str and unicode as dictionary keys

2006-08-03 Thread James Y Knight
On Aug 3, 2006, at 5:47 PM, M.-A. Lemburg wrote: The only way this error could be the right thing is if you were trying to suggest that he shouldn't mix unicode and bytestrings at all. Good question. I wonder whether that's a reasonable approach for Python 2.x (I'd say it is for Py3k).

Re: [Python-Dev] unicode hell/mixing str and unicode as dictionary keys

2006-08-03 Thread Bob Ippolito
On Aug 3, 2006, at 6:51 PM, Greg Ewing wrote: M.-A. Lemburg wrote: Perhaps we ought to add an exception to the dict lookup mechanism and continue to silence UnicodeErrors ?! Seems to be that comparison of unicode and non-unicode strings for equality shouldn't raise exceptions in the

Re: [Python-Dev] unicode hell/mixing str and unicode as dictionary keys

2006-08-03 Thread Josiah Carlson
Bob Ippolito [EMAIL PROTECTED] wrote: On Aug 3, 2006, at 6:51 PM, Greg Ewing wrote: M.-A. Lemburg wrote: Perhaps we ought to add an exception to the dict lookup mechanism and continue to silence UnicodeErrors ?! Seems to be that comparison of unicode and non-unicode strings for

Re: [Python-Dev] unicode hell/mixing str and unicode as dictionary keys

2006-08-03 Thread James Y Knight
On Aug 4, 2006, at 12:34 AM, Josiah Carlson wrote: As an alternate idea, rather than attempting to .decode('ascii') when strings and unicode compare, why not .decode('latin-1')? We lose the unicode decoding error, but the right thing happens (in my opinion) when u'\xa1' and '\xa1' compare.

Re: [Python-Dev] unicode hell/mixing str and unicode as dictionary keys

2006-08-03 Thread Jean-Paul Calderone
On Thu, 03 Aug 2006 21:34:04 -0700, Josiah Carlson [EMAIL PROTECTED] wrote: Bob Ippolito [EMAIL PROTECTED] wrote: On Aug 3, 2006, at 6:51 PM, Greg Ewing wrote: M.-A. Lemburg wrote: Perhaps we ought to add an exception to the dict lookup mechanism and continue to silence UnicodeErrors ?!

Re: [Python-Dev] unicode hell/mixing str and unicode as dictionary keys

2006-08-03 Thread Michael Urman
On 8/3/06, Josiah Carlson [EMAIL PROTECTED] wrote: As an alternate idea, rather than attempting to .decode('ascii') when strings and unicode compare, why not .decode('latin-1')? We lose the unicode decoding error, but the right thing happens (in my opinion) when u'\xa1' and '\xa1' compare.