David Hopwood schrieb: > I disagree. Unicode strings should always be considered distinct from > non-ASCII byte strings. Implicitly encoding or decoding in order to > perform a comparison is a bad idea; it is expensive and will often do > the wrong thing.
That's a pretty irrelevant position at this point; Python has had the notion of a system encoding since Unicode was introduced, and we are not going to remove that just before a release candidate of Python 2.5. The question at hand is not whether certain object should compare unequal, but whether comparing them should raise an exception. >>> Which of the two conversions is selected is arbitrary; [...] > > It would not be arbitrary. In the common case where the byte encoding > uses "precomposed" characters, using "U.encode(system_encoding) == B" > will tend to succeed in more cases than "B.decode(system_encoding) == U", > because alternative representations of the same abstract character in > Unicode will be mapped to the same precomposed character. No, they won't (although they should, perhaps): py> u'o\u0308'.encode("latin-1") Traceback (most recent call last): File "<stdin>", line 1, in ? UnicodeEncodeError: 'latin-1' codec can't encode character u'\u0308' in position 1: ordinal not in range(256) In addition, it's also possible to find encodings (e.g. iso-2022) where different byte sequences decode to the same Unicode string. Regards, Martin _______________________________________________ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com