Bugs item #1564763, was opened at 2006-09-24 23:43 Message generated for change (Comment added) made by arigo You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=105470&aid=1564763&group_id=5470
Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: Unicode Group: Python 2.5 Status: Open Resolution: None Priority: 5 Submitted By: Joe Wreschnig (piman) Assigned to: M.-A. Lemburg (lemburg) Summary: Unicode comparison change in 2.4 vs. 2.5 Initial Comment: Python 2.5 changed the behavior of unicode comparisons in a significant way from Python 2.4, causing a test case failure in a module of mine. All tests passed with an earlier version of 2.5, though unfortunately I don't know what version in particular it started failing with. The following code prints out all True on Python 2.4; the strings are compared case-insensitively, whether they are my lowerstr class, real strs, or unicodes. On Python 2.5, the comparison between lowerstr and unicode is false, but only in one direction. If I make lowerstr inherit from unicode rather than str, all comparisons are true again. So at the very least, this is internally inconsistent. I also think changing the behavior between 2.4 and 2.5 constitutes a serious bug. ---------------------------------------------------------------------- >Comment By: Armin Rigo (arigo) Date: 2006-09-27 08:58 Message: Logged In: YES user_id=4771 Well, yes, that's what I tried to explain. I also tried to explain how the 2.5 behavior is the "right" one, and the previous 2.4 behavior is a mere accident of convoluted __eq__-vs-__cmp__ code paths in the comparison code. In other words, there is no chance to get the 2.4 behavior in, say, Python 3000, because the __cmp__-related convolutions will be gone and we will only have the "right" behavior left. ---------------------------------------------------------------------- Comment By: M.-A. Lemburg (lemburg) Date: 2006-09-26 11:13 Message: Logged In: YES user_id=38388 In any case, the introduction of the Unicode tp_richcompare slot is likely the cause for this behavior: $python2.5 lowerstr.py u'baR' == l'Bar'? False $ python2.4 lowerstr.py u'baR' == l'Bar'? True Note that in both Python 2.4 and 2.5, the lowerstr.__eq__() method is not even called. This is probably due to the fact that Unicode can compare itself to strings, so the w.__eq__(v) part of the rich comparison is never tried. Now, the Unicode .__eq__() converts the string to Unicode, so the right hand side becomes u'Bar' in both cases. I guess a debugger session is due... ---------------------------------------------------------------------- Comment By: M.-A. Lemburg (lemburg) Date: 2006-09-26 10:55 Message: Logged In: YES user_id=38388 Ah, wrong track: Py_TPFLAGS_HAVE_RICHCOMPARE is set via Py_TPFLAGS_DEFAULT. ---------------------------------------------------------------------- Comment By: M.-A. Lemburg (lemburg) Date: 2006-09-26 10:39 Message: Logged In: YES user_id=38388 Armin, is it possible that the missing Py_TPFLAGS_HAVE_RICHCOMPARE type flag in the Unicode type is causing this ? I just had a look at the code and it appears that the comparison code checks the flag rather than just looking at the slot itself (didn't even know there was such a type flag). ---------------------------------------------------------------------- Comment By: Armin Rigo (arigo) Date: 2006-09-25 21:33 Message: Logged In: YES user_id=4771 Sorry, I missed your comment: if lowerstr inherits from unicode then it just works. The reason is that 'abc'.__eq__(u'abc') returns NotImplemented, but u'abc'.__eq__('abc') returns True. This is only inconsistent because of the asymmetry between strings and unicodes: strings can be transparently turned into unicodes but not the other way around -- so unicode.__eq__(x) can accept a string as the argument x and convert it to a unicode transparently, but str.__eq__(x) does not try to convert x to a string if it is a unicode. It's not a completely convincing explanation, but I think it shows at least why we got at the current situation of Python 2.5. ---------------------------------------------------------------------- Comment By: Armin Rigo (arigo) Date: 2006-09-25 21:11 Message: Logged In: YES user_id=4771 This is an artifact of the change in the unicode class, which now has the proper __eq__, __ne__, __lt__, etc. methods instead of the semi-deprecated __cmp__. The mixture of __cmp__ and the other methods is not very well-defined. This is why your code worked in 2.4: a bit by chance. Indeed, in theory it should not, according to the language reference. So what I am saying is that although it is a behavior change from 2.4 to 2.5, I would argue that it is not a bug but a bug fix... The reason is that if we ignore the __eq__ vs __cmp__ issues, the operation 'a == b' is defined as: Python tries a.__eq__(b); if this returns NotImplemented, then Python tries b.__eq__(a). As an exception, if type(b) is a strict subclass of type(a), then Python tries in the other order. This is why you get the 2.5 behavior: if lowerstr inherits from str, it is not a subclass of unicode, so u'abc' == lowerstr() tries u'abc'.__eq__(), which works immediately. On the other hand, if lowerstr inherits from unicode, then Python tries first lowerstr().__eq__(u'abc'). This part of the Python object model - when to reverse the order or not - is a bit obscure and not completely helpful... Subclassing built-in types generally only works a bit. In your situation you should use a regular class that behaves in a string-like fashion, with an __eq__() method doing the case-insensitive comparison... if you can at all - there are places where you need a real string, so this "solution" might not be one either, but I don't see a better one :-( ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=105470&aid=1564763&group_id=5470 _______________________________________________ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com