Re: [Python-Dev] Dicts are broken Was: unicode hell/mixing str and unicode asdictionarykeys

2006-08-15 Thread mal
Martin v. Löwis wrote:
 M.-A. Lemburg schrieb:
 Python just doesn't know the encoding of the 8-bit string, so can't
 make any assumptions on it. As result, it raises an exception to inform
 the programmer.
 
 Oh, Python does make an assumption what the encoding is: it assumes
 it is the system encoding (i.e. ascii). Then invoking the ascii
 codec raises an exception, because the string clearly isn't ascii.

Right, and as consequence, Python raises an exception to let the
programmer correct the problem.

The subsequent solution to the problem may result in the
string being decoded into Unicode and the two resulting Unicode
objects being unequal, or it may also result in them being equal.
Python doesn't have this knowledge, so always returning false
is clearly wrong.

Hiding programmer errors is not making life easier in the
long run, so I'm -1 on having the equality comparison return
False.

Instead we should generate a warning in Python 2.5 and introduce
the exception in Python 2.6.

 Note that you do have to interpret the string as characters
  if you compare it to Unicode and there's nothing wrong with
  that.
 
 Consider this:
 py int(3+4j)
 Traceback (most recent call last):
   File stdin, line 1, in ?
 TypeError: can't convert complex to int; use int(abs(z))
 py 3 == 3+4j
 False

 So even though the conversion raises an exception, the
 values are determined to be not equal. Again, because int
 is a nearly true subset of complex, the conversion goes
 the other way, but *if* it would use the complex-int
 conversion, then the TypeError should be taken as
 a guarantee that the objects don't compare equal.

In the above example, you clearly know that the two are
unequal due to the relationship between complex numbers
having an imaginary part and integers..

The same is true for the overflow case:

 2**1 == 1.23
False
 float(2**1)
Traceback (most recent call last):
  File stdin, line 1, in ?
OverflowError: long int too large to convert to float

(Note that in Python 2.3 this used to raise an exception as well.)

However, this is not the case for 8-bit string vs. Unicode,
since you cannot use such extra knowledge if you find that ASCII
encoding assumption obviously doesn't match the string
in question.

 Expanding this view to Unicode should mean that a unicode
 string U equals a byte string B if
 U.encode(system_encode) == B or B.decode(system_encoding) == U,
 and that they don't equal otherwise 

Agreed.

Note that Python always coerces to the bigger type. As a result,
the second option is what is actually implemented in Python.

 (e.g. if the conversion
 fails with a not convertible exception). 

I disagree with this part.

Failure to decode a string doesn't imply inequality. It implies
that the programmer needs to step in and correct the problem by
making an explicit and conscious decision.

The alternative would be to decide that equal comparisons should never
be allowed to raise exceptions and instead have the equal comparison
return False. In which case, we'd have the revert the dict patch
altogether and instead silence all exceptions that
are generated during the equal comparison (not only in the dict
implementation), replacing them with a False return value.

-- 
Marc-Andre Lemburg
eGenix.com

Professional Python Services directly from the Source  (#1, Aug 07 2006)
 Python/Zope Consulting and Support ...http://www.egenix.com/
 mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/
 mxODBC, mxDateTime, mxTextTools ...http://python.egenix.com/


::: Try mxODBC.Zope.DA for Windows,Linux,Solaris,FreeBSD for free ! 

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Dicts are broken Was: unicode hell/mixing str and unicode asdictionarykeys

2006-08-08 Thread M.-A. Lemburg
Martin v. Löwis wrote:
 M.-A. Lemburg schrieb:
 Python just doesn't know the encoding of the 8-bit string, so can't
 make any assumptions on it. As result, it raises an exception to inform
 the programmer.
 
 Oh, Python does make an assumption what the encoding is: it assumes
 it is the system encoding (i.e. ascii). Then invoking the ascii
 codec raises an exception, because the string clearly isn't ascii.

Right, and as consequence, Python raises an exception to let the
programmer correct the problem.

The subsequent solution to the problem may result in the
string being decoded into Unicode and the two resulting Unicode
objects being unequal, or it may also result in them being equal.
Python doesn't have this knowledge, so always returning false
is clearly wrong.

Hiding programmer errors is not making life easier in the
long run, so I'm -1 on having the equality comparison return
False.

Instead we should generate a warning in Python 2.5 and introduce
the exception in Python 2.6.

 Note that you do have to interpret the string as characters
  if you compare it to Unicode and there's nothing wrong with
  that.
 
 Consider this:
 py int(3+4j)
 Traceback (most recent call last):
   File stdin, line 1, in ?
 TypeError: can't convert complex to int; use int(abs(z))
 py 3 == 3+4j
 False

 So even though the conversion raises an exception, the
 values are determined to be not equal. Again, because int
 is a nearly true subset of complex, the conversion goes
 the other way, but *if* it would use the complex-int
 conversion, then the TypeError should be taken as
 a guarantee that the objects don't compare equal.

In the above example, you clearly know that the two are
unequal due to the relationship between complex numbers
having an imaginary part and integers..

The same is true for the overflow case:

 2**1 == 1.23
False
 float(2**1)
Traceback (most recent call last):
  File stdin, line 1, in ?
OverflowError: long int too large to convert to float

(Note that in Python 2.3 this used to raise an exception as well.)

However, this is not the case for 8-bit string vs. Unicode,
since you cannot use such extra knowledge if you find that ASCII
encoding assumption obviously doesn't match the string
in question.

 Expanding this view to Unicode should mean that a unicode
 string U equals a byte string B if
 U.encode(system_encode) == B or B.decode(system_encoding) == U,
 and that they don't equal otherwise 

Agreed.

Note that Python always coerces to the bigger type. As a result,
the second option is what is actually implemented in Python.

 (e.g. if the conversion
 fails with a not convertible exception). 

I disagree with this part.

Failure to decode a string doesn't imply inequality. It implies
that the programmer needs to step in and correct the problem by
making an explicit and conscious decision.

The alternative would be to decide that equal comparisons should never
be allowed to raise exceptions and instead have the equal comparison
return False. In which case, we'd have the revert the dict patch
altogether and instead silence all exceptions that
are generated during the equal comparison (not only in the dict
implementation), replacing them with a False return value.

-- 
Marc-Andre Lemburg
eGenix.com

Professional Python Services directly from the Source  (#1, Aug 08 2006)
 Python/Zope Consulting and Support ...http://www.egenix.com/
 mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/
 mxODBC, mxDateTime, mxTextTools ...http://python.egenix.com/


::: Try mxODBC.Zope.DA for Windows,Linux,Solaris,FreeBSD for free ! 
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Dicts are broken Was: unicode hell/mixing str and unicode asdictionarykeys

2006-08-08 Thread Martin v. Löwis
M.-A. Lemburg schrieb:
 Hiding programmer errors is not making life easier in the
 long run, so I'm -1 on having the equality comparison return
 False.

There is no error to hide here. The objects are inequal, period.

 Instead we should generate a warning in Python 2.5 and introduce
 the exception in Python 2.6.

A warning about what? That you can't put byte string and Unicode
strings into the same dictionary (as keys)? Next we start not allowing
to put numbers and strings into the same dictionary, because there
is no conversion defined between them?

 In the above example, you clearly know that the two are
 unequal due to the relationship between complex numbers
 having an imaginary part and integers..

Right. And so I do when the byte string does not convert to
Unicode.

 However, this is not the case for 8-bit string vs. Unicode,
 since you cannot use such extra knowledge if you find that ASCII
 encoding assumption obviously doesn't match the string
 in question.

It's not the question Could there be a conversion under which
they are equal? If you ask that question, then

py 3==3
False

should raise an exception, because there exists a conversion under
which these objects are equal:

py int(3)==3
True

It's just that, under the conversion Python applies, the byte
string and the Unicode string are not equal.

 Note that Python always coerces to the bigger type. As a result,
 the second option is what is actually implemented in Python.
[which is decode-to-unicode]

It might be debatable which of the types is the bigger type. It's
not that byte strings are a true subset of Unicode strings, under
some conversion, since there are byte strings which have no Unicode
equivalent (because they are not characters, and don't convert under
the encoding), and there are Unicode strings that have no byte string
equivalent.

For example, if the system encoding is UTF-8, then byte string is
the bigger type (all Unicode strings convert to byte strings, but
not all byte strings convert to Unicode strings).

However, this is a red herring: Python has, for whatever reason,
chosen to convert byte-unicode, and nobody is questioning that
choice.

 I disagree with this part.
 
 Failure to decode a string doesn't imply inequality.

If the failure is these bytes don't have a meaningful character
interpretation, then the bytes are *clearly* not equal to
some character string.

 It implies
 that the programmer needs to step in and correct the problem by
 making an explicit and conscious decision.

There is no problem to correct. The strings *are* inequal.

 The alternative would be to decide that equal comparisons should never
 be allowed to raise exceptions and instead have the equal comparison
 return False.

There are many reasons why comparison could raise an exception.
It could be out of memory, it could be that there is an
internal/programming error in the codec being used, it could be
that the codec is not found (likewise for other comparisons).

However, if the codec is working properly, and clearly determines
that the byte string has no character string equivalent, then
it can't be equal to some character (unicode) string.

Regards,
Martin
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Dicts are broken Was: unicode hell/mixing str and unicode asdictionarykeys

2006-08-08 Thread Ralf Schmitt
Martin v. Löwis wrote:
 M.-A. Lemburg schrieb:
 Hiding programmer errors is not making life easier in the
 long run, so I'm -1 on having the equality comparison return
 False.
 
 There is no error to hide here. The objects are inequal, period.

And in the case of dicts it hides errors randomly...

 
 Instead we should generate a warning in Python 2.5 and introduce
 the exception in Python 2.6.
 
 A warning about what? That you can't put byte string and Unicode
 strings into the same dictionary (as keys)? Next we start not allowing
 to put numbers and strings into the same dictionary, because there
 is no conversion defined between them?

A warning that an exception has been ignored while adding a key to a 
dict, I guess. I'd see keep those dict changes, this is where real 
programmer errors are hidden.

 I disagree with this part.

 Failure to decode a string doesn't imply inequality.
 
 If the failure is these bytes don't have a meaningful character
 interpretation, then the bytes are *clearly* not equal to
 some character string.

One could also think of a magic encoding, which decodes non-ascii 
strings to None, making them compare unequal to any unicode string.

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Dicts are broken Was: unicode hell/mixing str and unicode asdictionarykeys

2006-08-08 Thread M.-A. Lemburg
Martin v. Löwis wrote:
 M.-A. Lemburg schrieb:
 Failure to decode a string doesn't imply inequality.
 
 If the failure is these bytes don't have a meaningful character
 interpretation, then the bytes are *clearly* not equal to
 some character string.

 It implies
 that the programmer needs to step in and correct the problem by
 making an explicit and conscious decision.
 
 There is no problem to correct. The strings *are* inequal.

If the programmer writes:

x = 'äöü'
y = u'äöü'
...
if x == y:
do_something()

then he clearly has had the intention to compare two character
strings.

Now, if what you were saying were true, then the above would
simply continue to work without raising an exception, possibly
causing the application to return wrong results.

With the exception, the programmer will have a chance to correct
the problem (in this case, probably a forgotten u-prefix) and also
be safe in not having the application produce wrong data -
something that's usually hard to detect, debug and, more
importantly, can have effects which are a lot worse than
a failing application.

Note that we are not discussing changing the behavior of the
__eq__ comparison between strings and Unicode, since this has
always been to raise exceptions in case the automatic propagation
fails.

The discussion is about silencing exceptions in the dict lookup
mechanism - something which used to happen and now no longer
is done.

Since this behavior is an implementation detail of the
dictionary implementation, users perceive this change as random
exceptions occurring in their application.

While these exceptions do hint at programming errors (the main
reason for no longer silencing them), the particular case in
the dict implementation requires some extra thought.

I've suggested to go about this in a slightly more user-friendly
way, namely by giving a warning instead of raising an exception
in Python 2.5 and then going for the exception in Python 2.6.

-- 
Marc-Andre Lemburg
eGenix.com

Professional Python Services directly from the Source  (#1, Aug 08 2006)
 Python/Zope Consulting and Support ...http://www.egenix.com/
 mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/
 mxODBC, mxDateTime, mxTextTools ...http://python.egenix.com/


::: Try mxODBC.Zope.DA for Windows,Linux,Solaris,FreeBSD for free ! 
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Dicts are broken Was: unicode hell/mixing str and unicode asdictionarykeys

2006-08-08 Thread Martin v. Löwis
M.-A. Lemburg schrieb:
 If the programmer writes:
 
 x = 'äöü'
 y = u'äöü'
 ...
 if x == y:
 do_something()
 
 then he clearly has had the intention to compare two character
 strings.

Programmers make all kinds of mistakes when comparing objects,
assuming that things ought to be equal that actually aren't:

py 1.6/math.pi*math.pi == 1.6
False
py if 10*10 is 100:
...   print yes
... else:
...   print no
...
no

 Now, if what you were saying were true, then the above would
 simply continue to work without raising an exception, possibly
 causing the application to return wrong results.

That correct. It is a programming mistake, hence you get a wrong
result. However, you cannot assume that every comparison between
a string and a Unicode object is always a programming mistake.
You must not raise exceptions just because of a *potential*
programming mistake; that's what PyChecker is there for.

 Note that we are not discussing changing the behavior of the
 __eq__ comparison between strings and Unicode, since this has
 always been to raise exceptions in case the automatic propagation
 fails.

Not sure what you are discussing: This is *precisely* what I'm
discussing. Making that change would solve this problem.

 The discussion is about silencing exceptions in the dict lookup
 mechanism - something which used to happen and now no longer
 is done.

No, that's not what the discussion is about. The discussion
is about the backwards incompatibility in Python 2.5 wrt.
Python 2.4. There are several ways to solve that; silencing
the exception is just one way.

I think it is the wrong way, as I think that
string-unicode-comparison should have a consistent behaviour
no matter where the comparison occurs.

 Since this behavior is an implementation detail of the
 dictionary implementation, users perceive this change as random
 exceptions occurring in their application.

That key comparison occurs is *not* an implementation detail.
It is a necessary and documented aspect of the dictionary
lookup.

 I've suggested to go about this in a slightly more user-friendly
 way, namely by giving a warning instead of raising an exception
 in Python 2.5 and then going for the exception in Python 2.6.

Yes, and I have suggested to make it even more user-friendly
by defining string-unicode-__eq__ in a sensible manner. It
is more user-friendly, because it doesn't show the inconsistency
Michael Hudson documented in

http://mail.python.org/pipermail/python-dev/2006-August/067981.html

Regards,
Martin

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Dicts are broken Was: unicode hell/mixing str and unicode asdictionarykeys

2006-08-08 Thread skip

Martin Programmers make all kinds of mistakes when comparing objects,
Martin assuming that things ought to be equal that actually aren't:

py 1.6/math.pi*math.pi == 1.6
False

By extension, perhaps Computer Science departments should begin offering
Unicode Analysis as an advanced undergraduate class. ;-)

(Sorry, couldn't resist...)

Skip
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Dicts are broken Was: unicode hell/mixing str and unicode asdictionarykeys

2006-08-08 Thread Greg Ewing
M.-A. Lemburg wrote:

 Hiding programmer errors is not making life easier in the
 long run, so I'm -1 on having the equality comparison return
 False.

I don't see how this is greatly different from, e.g.

   [1, 2] == (1, 2)

returning False. Comparing things of different types
may or may not indicate a bug in the code as well,
but we don't seem to worry that it doesn't raise an
exception.

--
Greg
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Dicts are broken Was: unicode hell/mixing str and unicode asdictionarykeys

2006-08-07 Thread Martin v. Löwis
M.-A. Lemburg schrieb:
 Python just doesn't know the encoding of the 8-bit string, so can't
 make any assumptions on it. As result, it raises an exception to inform
 the programmer.

Oh, Python does make an assumption what the encoding is: it assumes
it is the system encoding (i.e. ascii). Then invoking the ascii
codec raises an exception, because the string clearly isn't ascii.

 It is well possible that the string uses an encoding where the
 Unicode string is indeed the equal to the string, assuming this
 encoding

So what? Python uses the system encoding for this operation.
What does it matter that the result would be different if it
had used a different encoding.

The strings are unequal under the system encoding; it's irrelevant
that they might be equal under a different encoding.

The same holds for the ASCII part (i.e. where you don't get an
exception):

py ufoo == sbb
False
py ufoo.encode(rot13) == sbb
True

So the strings compare as unequal, even though they compare
equal if treated as rot13. That doesn't stop Python from considering
them unequal.

Regards,
Martin
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


[Python-Dev] Dicts are broken Was: unicode hell/mixing str and unicode asdictionarykeys

2006-08-04 Thread Michael Chermside
I'm changing the subject line because I want to convince everyone that
the problem being discussed in the unicode hell thread has nothing
to do with unicode and strings. It's all about dicts.

I have not observed real breakage in my own code, but I will present
a sample of made-up-but-completely-reasonable code that works in
2.4, fails in 2.5, and arguably ought to work fine. I think we should
restore the behavior of dicts that when they compare keys for
equality they suppress exceptions (treating the objects as unequal),
or at LEAST retain the behavior for one more release making it a
warning this time.

Here is my sample code:

--- problem_with_dicts.py --
# A sample program to demonstrate that the proposed behavior
# of dicts in Python 2.5 generates bugs. This is not code from
# an actual program, but it is completely reasonable.

# First import from
# http://aspn.activestate.com/ASPN/Cookbook/Python/Recipe/413486
# the only 5-star recipe in the Python Cookbook for implementing
# enums.
import cookbookenum

# Then set up some reasonable enums. We'll say we're writing
# a program for dealing with dates.
DaysOfWeek = cookbookenum.Enum(
 'Mon', 'Tue', 'Wed', 'Thu', 'Fri', 'Sat', 'Sun')
Months = cookbookenum.Enum(
 'Jan','Feb','Mar','Apr','May','Jun','Jul','Aug',
 'Sep','Oct','Nov','Dec')

# Let's say we also do some translations. Here is a
# useful dictionary:
translations = {}
# which we populate with values
translations[ DaysOfWeek.Mon ] = {
 'en': 'Monday',
 'es': 'Lunes',
 'fr': 'Lundi',
 }
# and assume we do the other days
translations[ Months.Jan ] = {
 'en': 'January',
 'es': 'Enero',
 'fr': 'Janvier',
 }
# and assume we do the other months

# ...then later in the code we could do things like this:
language = 'en'
dayOfWeek = DaysOfWeek.Mon
month = Months.Jan
dayOfMonth = 3
print '%s, %s %s' % (
 translations[dayOfWeek][language],
 translations[month][language],
 dayOfMonth)

# this works in 2.4 but fails in 2.5
- end problem_with_dicts.py 

Please reconsider.

-- Michael Chermside

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Dicts are broken Was: unicode hell/mixing str and unicode asdictionarykeys

2006-08-04 Thread Michael Hudson
Michael Chermside [EMAIL PROTECTED] writes:

 I'm changing the subject line because I want to convince everyone that
 the problem being discussed in the unicode hell thread has nothing
 to do with unicode and strings. It's all about dicts.

I'd say it's more to do with __eq__.  It's a strange __eq__ method
that raises an Exception, IMHO.

Please do realize that the motivation for this change was hours and
hours of tortous debugging caused by a buggy __eq__ method making keys
impossibly seem to not be in dictionaries.

 I have not observed real breakage in my own code, but I will present
 a sample of made-up-but-completely-reasonable code that works in
 2.4, fails in 2.5, and arguably ought to work fine. I think we should
 restore the behavior of dicts that when they compare keys for
 equality they suppress exceptions (treating the objects as unequal),
 or at LEAST retain the behavior for one more release making it a
 warning this time.

Please no.  Here's just one piece of evidence that the 2.4 semantics
are pretty silly too:

 d = {u'\xe0':1, '\xe0\:2}
  File stdin, line 1
d = {u'\xe0':1, '\xe0\:2}
^
SyntaxError: EOL while scanning single-quoted string
 d = {u'\xe0':1, '\xe0':2}
 '\xe0' in d
True
 '\xe0' in d.keys()
Traceback (most recent call last):
  File stdin, line 1, in ?
UnicodeDecodeError: 'ascii' codec can't decode byte 0xe0 in position 0: ordinal 
not in range(128)

Cheers,
mwh

-- 
  same software, different verbosity settings (this one goes to
  eleven) -- the effbot on the martellibot
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Dicts are broken Was: unicode hell/mixing str and unicode asdictionarykeys

2006-08-04 Thread Terry Reedy

Michael Hudson [EMAIL PROTECTED] wrote in message 
news:[EMAIL PROTECTED]
 Michael Chermside [EMAIL PROTECTED] writes:

 I'm changing the subject line because I want to convince everyone that
 the problem being discussed in the unicode hell thread has nothing
 to do with unicode and strings. It's all about dicts.

 I'd say it's more to do with __eq__.  It's a strange __eq__ method
 that raises an Exception, IMHO.

I agree; a == b should always work, certainly unless explicitly programmed 
otherwise in Python for a particular class.  So I think the proper solution 
is fix the buggy __eq__ method to return False instead.  If a byte string 
does not have a default (ascii) text interpretation, then it obviously is 
not equal to any particular unicode text.

The fundamental axiom of sets and hence of dict keys is that any 
object/value either is or is not a member (at any given time for 'mutable' 
set collections).  This requires that testing an object for possible 
membership by equality give a clean True or False answer.

 Please do realize that the motivation for this change was hours and
 hours of tortous debugging caused by a buggy __eq__ method making keys
 impossibly seem to not be in dictionaries.

So why not fix the buggy __eq__ method?

 2.4, fails in 2.5, and arguably ought to work fine. I think we should
 restore the behavior of dicts that when they compare keys for
 equality they suppress exceptions (treating the objects as unequal),
 or at LEAST retain the behavior for one more release making it a
 warning this time.

 Please no.  Here's just one piece of evidence that the 2.4 semantics
 are pretty silly too:

We mostly agreed half a decode ago that 1/2 should be .5 instead of 0, but 
to avoid breaking code, have (or Guido has) refrained from yet making the 
change the default.  To me, the same principle applies here at least as 
strongly.

Terry Jan Reedy



___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Dicts are broken Was: unicode hell/mixing str and unicode asdictionarykeys

2006-08-04 Thread M.-A. Lemburg
Terry Reedy wrote:
 Michael Hudson [EMAIL PROTECTED] wrote in message 
 news:[EMAIL PROTECTED]
 Michael Chermside [EMAIL PROTECTED] writes:

 I'm changing the subject line because I want to convince everyone that
 the problem being discussed in the unicode hell thread has nothing
 to do with unicode and strings. It's all about dicts.
 I'd say it's more to do with __eq__.  It's a strange __eq__ method
 that raises an Exception, IMHO.
 
 I agree; a == b should always work, certainly unless explicitly programmed 
 otherwise in Python for a particular class. 

... which this is.

 So I think the proper solution 
 is fix the buggy __eq__ method to return False instead.  If a byte string 
 does not have a default (ascii) text interpretation, then it obviously is 
 not equal to any particular unicode text.
 
 The fundamental axiom of sets and hence of dict keys is that any 
 object/value either is or is not a member (at any given time for 'mutable' 
 set collections).  This requires that testing an object for possible 
 membership by equality give a clean True or False answer.
 
 Please do realize that the motivation for this change was hours and
 hours of tortous debugging caused by a buggy __eq__ method making keys
 impossibly seem to not be in dictionaries.
 
 So why not fix the buggy __eq__ method?

Because it's not buggy.

Python just doesn't know the encoding of the 8-bit string, so can't
make any assumptions on it. As result, it raises an exception to inform
the programmer.

It is well possible that the string uses an encoding where the
Unicode string is indeed the equal to the string, assuming this
encoding, e.g.

 s = 'trärää'
 u = u'trärää'
 s == u
Traceback (most recent call last):
  File stdin, line 1, in ?
UnicodeDecodeError: 'ascii' codec can't decode byte 0xe4 in position 2:
ordinal not in range(128)
 hash(s)
673683206
 hash(u)
673683206

Here, the encoding that creates the match is Latin-1.

 2.4, fails in 2.5, and arguably ought to work fine. I think we should
 restore the behavior of dicts that when they compare keys for
 equality they suppress exceptions (treating the objects as unequal),
 or at LEAST retain the behavior for one more release making it a
 warning this time.
 Please no.  Here's just one piece of evidence that the 2.4 semantics
 are pretty silly too:
 
 We mostly agreed half a decode ago that 1/2 should be .5 instead of 0, but 
 to avoid breaking code, have (or Guido has) refrained from yet making the 
 change the default.  To me, the same principle applies here at least as 
 strongly.

I think that's a different category of semantic change: the integer
division change will cause applications to return wrong data (if not
fixed properly). The exception will just let the application refuse
to continue.

How about generating a warning instead and then go for the exception
in 2.6 ?

-- 
Marc-Andre Lemburg
eGenix.com

Professional Python Services directly from the Source  (#1, Aug 04 2006)
 Python/Zope Consulting and Support ...http://www.egenix.com/
 mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/
 mxODBC, mxDateTime, mxTextTools ...http://python.egenix.com/


::: Try mxODBC.Zope.DA for Windows,Linux,Solaris,FreeBSD for free ! 
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Dicts are broken Was: unicode hell/mixing str and unicode asdictionarykeys

2006-08-04 Thread Barry Warsaw
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

On Aug 4, 2006, at 11:43 AM, M.-A. Lemburg wrote:

 How about generating a warning instead and then go for the exception
 in 2.6 ?

 From the perspective of wanting to avoid blog entries in 2007  
railing against our gratuitous breakages in Python 2.5 but also  
wanting to avoid perpetuating broken code, this solution seems the  
most reasonable.

- -Barry

-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.3 (Darwin)

iQCVAwUBRNNu8HEjvBPtnXfVAQL9MQP/SCuLPFwS0m5vWJ3+i2iVZVCg21eXKQte
R8zoTngSx7unxfn5WQ7Bi8l9Ai1SkmZ7z0mOr6UbtRXmxM9+HwSFvpYpWazuaC4R
AylYA1ZbfsnzplHZW/TPxZKZJG++qWK2+mIcdHa9MS6OoBb583BQ4oXN8gs6tT2P
9VoUL5OW4Gw=
=IPeG
-END PGP SIGNATURE-
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Dicts are broken Was: unicode hell/mixing str and unicode asdictionarykeys

2006-08-04 Thread Michael Chermside
Marc-Andre Lemburg writes:

 How about generating a warning instead and then go for the exception
 in 2.6 ?

Agreed. Michael Hudson's explanation convinced me.

-- Michael Chermside

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Dicts are broken Was: unicode hell/mixing str and unicode asdictionarykeys

2006-08-04 Thread Giovanni Bajo
Paul Colomiets [EMAIL PROTECTED] wrote:

 Well it's not recomended to mix strings and unicode in the
 dictionaries
 but if we mix for example integer and float we have the same thing. It
 doesn't raise exception but still it is not expected behavior for me:
   d = { 1.0: 10, 2.0: 20 }
 then if i somewhere later do:
   d[1] = 100
   d[2] = 200
 to have here all floats in d.keys(). May be this is not a best
 example.

There is a strong difference. Python is moving towards unifying number types in
a way (see the true division issue): the idea is that, all in all, user
shouldn't really care what type a number is, as long as he knows it's a number.
On the other hand, unicode and str are going to diverge more and more.

Giovanni Bajo

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Dicts are broken Was: unicode hell/mixing str and unicode asdictionarykeys

2006-08-04 Thread Bob Ippolito

On Aug 4, 2006, at 12:51 PM, Giovanni Bajo wrote:

 Paul Colomiets [EMAIL PROTECTED] wrote:

 Well it's not recomended to mix strings and unicode in the
 dictionaries
 but if we mix for example integer and float we have the same  
 thing. It
 doesn't raise exception but still it is not expected behavior for me:
 d = { 1.0: 10, 2.0: 20 }
 then if i somewhere later do:
 d[1] = 100
 d[2] = 200
 to have here all floats in d.keys(). May be this is not a best
 example.

 There is a strong difference. Python is moving towards unifying  
 number types in
 a way (see the true division issue): the idea is that, all in all,  
 user
 shouldn't really care what type a number is, as long as he knows  
 it's a number.
 On the other hand, unicode and str are going to diverge more and more.

Well, not really. True division makes int/int return float instead of  
an int. You really do have to care if you have an int or a float most  
of the time, they're very different semantically.

Unicode and str are eventually going to be the same thing (str would  
ideally end up becoming a synonym of unicode). The difference being  
that there will be some other type to contain bytes.

-bob

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Dicts are broken Was: unicode hell/mixing str and unicode asdictionarykeys

2006-08-04 Thread Paul Colomiets
Giovanni Bajo wrote:
 Paul Colomiets [EMAIL PROTECTED] wrote:

   
 Well it's not recomended to mix strings and unicode in the
 dictionaries
 but if we mix for example integer and float we have the same thing. It
 doesn't raise exception but still it is not expected behavior for me:
   d = { 1.0: 10, 2.0: 20 }
 then if i somewhere later do:
   d[1] = 100
   d[2] = 200
 to have here all floats in d.keys(). May be this is not a best
 example.
 

 There is a strong difference. Python is moving towards unifying number types 
 in
 a way (see the true division issue): the idea is that, all in all, user
 shouldn't really care what type a number is, as long as he knows it's a 
 number.
 On the other hand, unicode and str are going to diverge more and more.

 Giovanni Bajo

   
It makes sense, but consider this example:

  from decimal import Decimal
  d = {}
  d[Decimal(0)] = 1
  d[0] = 2
  d[Decimal(0.5)] = 3
  d[0.5]  = 4
  d.keys()
[Decimal(0), 0.5, Decimal(0.5)]

I expect d.keys() to have 2 or 4 keys but don't 3, it's confusing. Don't 
you think that someday line d[0.5] = 4 will raise exception? Or at 
least that it should raise if mixing str and unicode raises?

--
Regards,
  Paul.
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Dicts are broken Was: unicode hell/mixing str and unicode asdictionarykeys

2006-08-04 Thread Ron Adam
M.-A. Lemburg wrote:
 Terry Reedy wrote:
 Michael Hudson [EMAIL PROTECTED] wrote in message 
 news:[EMAIL PROTECTED]
 Michael Chermside [EMAIL PROTECTED] writes:

 I'm changing the subject line because I want to convince everyone that
 the problem being discussed in the unicode hell thread has nothing
 to do with unicode and strings. It's all about dicts.
 I'd say it's more to do with __eq__.  It's a strange __eq__ method
 that raises an Exception, IMHO.
 I agree; a == b should always work, certainly unless explicitly programmed 
 otherwise in Python for a particular class. 
 
 ... which this is.
 
 So I think the proper solution 
 is fix the buggy __eq__ method to return False instead.  If a byte string 
 does not have a default (ascii) text interpretation, then it obviously is 
 not equal to any particular unicode text.

 The fundamental axiom of sets and hence of dict keys is that any 
 object/value either is or is not a member (at any given time for 'mutable' 
 set collections).  This requires that testing an object for possible 
 membership by equality give a clean True or False answer.

 Please do realize that the motivation for this change was hours and
 hours of tortous debugging caused by a buggy __eq__ method making keys
 impossibly seem to not be in dictionaries.
 So why not fix the buggy __eq__ method?
 
 Because it's not buggy.
 
 Python just doesn't know the encoding of the 8-bit string, so can't
 make any assumptions on it. As result, it raises an exception to inform
 the programmer.
 
 It is well possible that the string uses an encoding where the
 Unicode string is indeed the equal to the string, assuming this
 encoding, e.g.

Isn't this a case where it should be up to the programmer to make sure 
the comparison makes sense in the context it is being used.  That is, if 
I'm comparing two different forms of data, it's up to me to convert them 
explicitly to the same form before comparing them?

In the case of comparing an 8 bit string and unicode, I would think they 
are always unequal.  But changing that now would probably (?) break way 
too much. (but it may also uncover quite a few potential or even real 
bugs as well.) ;-)

Cheers,
Ron









___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Dicts are broken Was: unicode hell/mixing str and unicode asdictionarykeys

2006-08-04 Thread Josiah Carlson

Jean-Paul Calderone [EMAIL PROTECTED] wrote:
 On Fri, 04 Aug 2006 11:23:10 -0700, Josiah Carlson [EMAIL PROTECTED] wrote:
 There's one problem with generating a warning for 2.5, and that is the
 same problem as generating a warning for possible packages that lack an
 __init__.py; users may start to get a bunch of warnings, and be unaware
 of how to suppress them.
 
 All in all though, I'm +0 on the warning, and +1 on it not raising an
 exception in 2.5 .
 
 Um.  This warning would indicate a bug in the code which will lead to
 actual misbehavior in a future version of Python.  The __init__.py
 warning would have indicated a deployment configuration which didn't
 actually cause any misbehavior.
 
 They aren't the same case at all, unless you think that all warnings
 should be classed this way (a position I do not think is completely
 unreasonable, but since you singled out the package warning by way of
 comparison, I assume this is not the argument you are trying to make).

I see both as being a potential cause for a large number of warning
messages to people starting to use Python 2.5 (from 2.3 or 2.4) .

 - Josiah

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com