Ralf Schmitt wrote: >>>> Still trying to port our software. here's another thing I noticed: >>>> >>>> d = {} >>>> d[u'm\xe1s'] = 1 >>>> d['m\xe1s'] = 1 >>>> print d >>>> >>>> With python 2.5 I get: >>>> >>>> $ python2.5 t2.py >>>> Traceback (most recent call last): >>>> File "t2.py", line 3, in <module> >>>> d['m\xe1s'] = 1 >>>> UnicodeDecodeError: 'ascii' codec can't decode byte 0xe1 in position 1: >>>> ordinal not in range(128) >>>> >> This is because Unicode and 8-bit string keys only work >> in the same way if and only if they are plain ASCII. > > This is okay. But in the case where one is not ASCII I would prefer to > be able to compare them (not equal) instead of getting a UnicodeError. > I know it's too late to change this, ...
It is too late to change this, since it was always like this ;-) Seriously, Unicode is doing the right thing here: you should really always get an exception if you compare apples and oranges, rather than reverting to comparing the ids of apples and oranges as fall-back solution. I believe that Py3k will implement this. >> The reason lies in the hash function used by Unicode: it is >> crafted to make hash(u) == hash(s) for all ASCII s, such >> that s == u. >> >> For non-ASCII strings, there are no guarantees as to the >> hash value of the strings or whether they match or not. >> >> This has been like that since Unicode was introduced, so it's >> not new in Python 2.5. >> > > ...but in the case of dictionaries this behaviour has changed and in > prior versions of python dictionaries did work as I expected them to. > Now they don't. Let's put it this way: Python 2.5 uncovered a bug in your application that has always been there. It's better to fix your application than arguing to cover up the bug again. > When working with unicode strings and (accidently) mixing with str > strings, things might seem to work until the first non-ascii string > is given to some code and one gets that UnicodeDecodeError (e.g. when > comparing them). > > If one mixes unicode strings and str strings as keys in a dictionary > things might seem to work far longer until he tries to put in some non > ASCII string with the "wrong" hash value and suddenly things go boom. > I'd rather keep the pre 2.5 behaviour. It's actually a good preparation for Py3k where 1 == u'abc' will (likely) also raise an exception. -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source (#1, Aug 03 2006) >>> Python/Zope Consulting and Support ... http://www.egenix.com/ >>> mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/ >>> mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/ ________________________________________________________________________ ::: Try mxODBC.Zope.DA for Windows,Linux,Solaris,FreeBSD for free ! :::: _______________________________________________ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com