On Aug 3, 2006, at 9:51 AM, M.-A. Lemburg wrote: > Ralf Schmitt wrote: >> Ralf Schmitt wrote: >>> Still trying to port our software. here's another thing I noticed: >>> >>> d = {} >>> d[u'm\xe1s'] = 1 >>> d['m\xe1s'] = 1 >>> print d >>> >>> With python 2.4 I can add those two keys to the dictionary and get: >>> $ python2.4 t2.py >>> {u'm\xe1s': 1, 'm\xe1s': 1} >>> >>> With python 2.5 I get: >>> >>> $ python2.5 t2.py >>> Traceback (most recent call last): >>> File "t2.py", line 3, in <module> >>> d['m\xe1s'] = 1 >>> UnicodeDecodeError: 'ascii' codec can't decode byte 0xe1 in >>> position 1: >>> ordinal not in range(128) >>> >>> Is this intended behaviour? I guess this might break lots of >>> programs >>> and the way python 2.4 works looks right to me. >>> I think it should be possible to mix str/unicode keys in dicts >>> and let >>> non-ascii strings compare not-equal to any unicode string. >> >> Also this behaviour makes your programs break randomly, that is, >> it will >> break when the string you add hashes to the same value that the >> unicode >> string has (at least that's what I guess..) > > This is because Unicode and 8-bit string keys only work > in the same way if and only if they are plain ASCII. > > The reason lies in the hash function used by Unicode: it is > crafted to make hash(u) == hash(s) for all ASCII s, such > that s == u. > > For non-ASCII strings, there are no guarantees as to the > hash value of the strings or whether they match or not. > > This has been like that since Unicode was introduced, so it's > not new in Python 2.5.
What is new is that the exception raised on "u == s" after hash collision is no longer silently swallowed. -bob _______________________________________________ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com