Re: Rich Comparisons Gotcha
Rhamphoryncus wrote: You grossly overvalue using the in operator on lists. Maybe. But there is more to it than just 'in'. If you do: c = numpy.zeros((2,)) ll = [1, c, 3.] then the following all throw errors: 3 in ll, 3 not in ll, ll.index(3), ll.count(3), ll.remove(3) c in ll, c not in ll, ll.index(c), ll.count(c), ll.remove(c) Note how the presence of c in the list makes it behave wrong for 3 as well. It's far more common to use a dict or set for containment tests, due to O(1) performance rather than O(n). I doubt the numpy array supports hashing, so an error for misuse is all you should expect. Indeed it doees not. So there is not much to be gained from modifying equality comparison with sets/dicts. In the rare case that you want to test for identity in a list, you can easily write your own function to do it upfront: def idcontains(seq, obj): for i in seq: if i is obj: return True return False Again, you can code around any particular case (though wrappers look like a more robust solution). Still, why not get rid of this wart, if we can find a way? --- Dr. Rasmus H. Fogh Email: [EMAIL PROTECTED] Dept. of Biochemistry, University of Cambridge, 80 Tennis Court Road, Cambridge CB2 1GA, UK. FAX (01223)766002 -- http://mail.python.org/mailman/listinfo/python-list
Re: Rich Comparisons Gotcha
Rhodri James wrote: On Mon, 08 Dec 2008 14:24:59 -, Rasmus Fogh wrote: On the minus side there would be the difference between '__equal__' and '__eq__' to confuse people. This is a very big minus. It would be far better to spell __equal__ in such a way as to make it clear why it wasn't the same as __eq__, otherwise you end up with the confusion that the Perl == and eq operators regularly cause. You are probably right, unfortunately. That proposal is unlikely to fly. Do you think my latest proposal, raising BoolNotDefinedError, has better chances? --- Dr. Rasmus H. Fogh Email: [EMAIL PROTECTED] Dept. of Biochemistry, University of Cambridge, 80 Tennis Court Road, Cambridge CB2 1GA, UK. FAX (01223)766002 -- http://mail.python.org/mailman/listinfo/python-list
Re: Rich Comparisons Gotcha
Steven DAprano wrote: On Mon, 08 Dec 2008 14:24:59 +, Rasmus Fogh wrote: For my personal problem I could indeed wrap all objects in a wrapper with whatever 'correct' behaviour I want (thanks, TJR). It does seem a bit much, though, just to get code like this to work as intended: alist.append(x) print ('x is present: ', x in alist) So, I would much prefer a language change. I am not competent to even propose one properly, but I'll try. You think changing the language is easier than applying a wrapper to your own data??? Oh my, that's too funny for words. Any individual case of the problem can be hacked somehow - I have already fixed this one. My point is that python would be a better language if well-written classes that followed normal python conventions could be relied on to work correctly with list, and that it is worth trying to bring this about. Lists are a central structure of the language after all. Of course you can disagree, or think the work required would be disproportionate, but surely there is nothing unreasonable about my point? Rasmus --- Dr. Rasmus H. Fogh Email: [EMAIL PROTECTED] Dept. of Biochemistry, University of Cambridge, 80 Tennis Court Road, Cambridge CB2 1GA, UK. FAX (01223)766002 -- http://mail.python.org/mailman/listinfo/python-list
Re: Rich Comparisons Gotcha
Steven DAprano wrote: On Mon, 08 Dec 2008 14:24:59 +, Rasmus Fogh wrote: snip What might be a sensible behaviour (unlike your proposed wrapper) Sorry 1) I was rude, 2) I thanked TJR for your wrapper class proposal in a later mail. It is yours. What do you dislike about my wrapper class? Perhaps it is fixable. I think it is a basic requirement for functioning lists that you get alist = [1,x] x in alist True alist.remove(x) alist [1] # unless of course x == 1, in which case the list is [x]. Your wrapper would not provide this behaviour. It is necessary to do if x is y: return True be it in the eq() function, or in the list implementation. Note that this is the current python behaviour for nan in lists, whatever the mathematics say. would be the following: def eq(x, y): if x is y: return True I've already mentioned NaNs. Sentinel values also sometimes need to compare not equal with themselves. Forcing them to compare equal will cause breakage. The list.__contains__ method already checks 'x is y' before it checks 'x == y'. I'd say that a list where my example above does not work is broken already, but of course I do not want to break further code. Could you give an example of this use of sentinel values? else: try: return (x == y) except Exception: return False Why False? Why not True? If an error occurs inside __eq__, how do you know that the correct result was False? class Broken(object): def __eq__(self, other): return Treu # oops, raises NameError In managing collections the purpose of eq would be to divide objects into a small set that are all equal to each other, and a larger set that are all unequal to all members of the first set. That requires default to False. If you default to True then eq(aNumpyArray, x) would return True for all x. If an error occurs inside __eq__ it could be 1) because __eq__ is badly written, or 2) because the type of y was not considered by the implementers of x or is in some deep way incompatible with x. 1) I cannot help, and for 2) I am simply saying that value semantics require an __eq__ that returns a truth value. In the absence of that I want identity semantics. Rasmus -- http://mail.python.org/mailman/listinfo/python-list
Re: Rich Comparisons Gotcha
Mark Dickinson wrote: On Dec 8, 2:24 pm, Rasmus Fogh [EMAIL PROTECTED] wrote: So, I would much prefer a language change. I am not competent to even propose one properly, but I'll try. I don't see any technical problems in what you propose: as far as I can see it's entirely feasible. However: should. On the minus side there would be the difference between '__equal__' and '__eq__' to confuse people. I think this is exactly what makes the idea a non-starter. There are already enough questions on the lists about when to use 'is' and when to use '==', without adding an 'equals' function into the mix. It would add significant extra complexity to the core language, for questionable (IMO) gain. So: It is perfectly acceptable behaviour to have __eq__ return a value that cannot be cast to a boolean, but it still does break the python list. The fixes proposed so far all get the thumbs down, for various good reasons. How about: - Define a new built-in Exception BoolNotDefinedError(ValueError) - Have list.__contains__ (etc.) use the following comparison internally: def newCollectionTest(x,y): if x is y: return True else: try: return bool(x == y) except BoolNotDefinedError: return False - Recommend that numpy.array.__nonzero__ and similar cases raise BoolNotDefinedError instead of ValueError Objects that choose to raise BoolNotDefinedError will now work in lists, with identity semantics. Objects that do not raise BoolNotDefinedError have no change in behaviour. Remains to be seen how hard it is to implement, and how much it slows down list.__contains__ Rasmus --- Dr. Rasmus H. Fogh Email: [EMAIL PROTECTED] Dept. of Biochemistry, University of Cambridge, 80 Tennis Court Road, Cambridge CB2 1GA, UK. FAX (01223)766002 -- http://mail.python.org/mailman/listinfo/python-list
Re: Rich Comparisons Gotcha
Rober Kern wrote: James Stroud wrote: Steven D'Aprano wrote: On Sun, 07 Dec 2008 13:57:54 -0800, James Stroud wrote: Rasmus Fogh wrote: ll1 = [y,1] y in ll1 True ll2 = [1,y] y in ll2 Traceback (most recent call last): File stdin, line 1, in module ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all() I think you could be safe calling this a bug with numpy. Only in the sense that there are special cases where the array elements are all true, or all false, and numpy *could* safely return a bool. But special cases are not special enough to break the rules. Better for the numpy caller to write this: a.all() # or any() instead of: try: bool(a) except ValueError: a.all() as they would need to do if numpy sometimes returned a bool and sometimes raised an exception. I'm missing how a.all() solves the problem Rasmus describes, namely that the order of a python *list* affects the results of containment tests by numpy.array. E.g. y in ll1 and y in ll2 evaluate to different results in his example. It still seems like a bug in numpy to me, even if too much other stuff is broken if you fix it (in which case it apparently becomes an issue). It's an issue, if anything, not a bug. There is no consistent implementation of bool(some_array) that works in all cases. numpy's predecessor Numeric used to implement this as returning True if at least one element was non-zero. This works well for bool(x!=y) (which is equivalent to (x!=y).any()) but does not work well for bool(x==y) (which should be (x==y).all()), but many people got confused and thought that bool(x==y) worked. When we made numpy, we decided to explicitly not allow bool(some_array) so that people will not write buggy code like this again. You are so right, Robert: The deficiency is in the feature of rich comparisons, not numpy's implementation of it. __eq__() is allowed to return non-booleans; however, there are some parts of Python's implementation like list.__contains__() that still expect the return value of __eq__() to be meaningfully cast to a boolean. One might argue if this is a deficiency in rich comparisons or a rather a bug in list, set and dict. Certainly numpy is following the rules. In fact numpy should be applauded for throwing an error rather than returning a misleading value. For my personal problem I could indeed wrap all objects in a wrapper with whatever 'correct' behaviour I want (thanks, TJR). It does seem a bit much, though, just to get code like this to work as intended: alist.append(x) print ('x is present: ', x in alist) So, I would much prefer a language change. I am not competent to even propose one properly, but I'll try. First, to clear the air: Rich comparisons, the ability to overload '==', and the constraints (or lack of them) on __eq__ must stay unchanged. There are reasons for their current behaviour - ieee754 is particularly convincing - and anyway they are not going to change. No point in trying. There remains the problem is that __eq__ is used inside python 'collections' (list, set, dict etc.), and that the kind of overloading used (quite legitimately) in numpy etc. breaks the collection behaviour. It seems that proper behaviour of the collections requires an equality test that satisfies: 1) x equal x 2) x equal y = y equal x 3) x equal y and y equal z = x equal z 4) (x equal y) is a boolean 5) (x equal y) is defined (and will not throw an error) for all x,y 6) x unequal y == not(x equal y) (by definition) Note to TJR: 5) does not mean that Python should magically shield me from errors. All I am asking is that programmers design their equal() function to avoid raising errors, and that errors raised from equal() clearly count as bugs. I cannot imagine getting the collections to work in a simple and intuitive manner without an equality test that satisfies 1)-6). Maybe somebody else can. Instead I would propose adding an __equal__ special method for the purpose. It looks like the current collections use the folowing, at least in part def oldCollectionTest(x,y): if x is y: return True else: return (x == y) I would propose adding a new __equal__ method that satisfies 2) - 6) above. We could then define def newCollectionTest(x,y): if x is y: # this takes care of satisfying 1) return True elif hasattr(x, '__equal__'): return x.__equal__(y) elif hasattr(y, '__equal__'): return y.__equal__(x) else: return False The implementations for list, set and dict would then behave according to newCollectionTest. We would also want an equal() built-in with the same behaviour. In plain words, the default behaviour would be identity semantics. Objects that wanted value semantics could implement an __equal__ function with the correct behaviour. Wherever possible __equal__ would be the same as __eq__. This function may deviate from 'proper' behaviour in some cases. All I claim
Re: Rich Comparisons Gotcha
Robert Kern Wrote: Terry Reedy wrote: Rasmus Fogh wrote: Personally I would like to get these [EMAIL PROTECTED]* misfeatures removed, What you are calling a misfeature is an absence, not a presence that can be removed. That's not quite true. Rich comparisons explicitly allow non-boolean return values. Breaking up __cmp__ into multiple __special__ methods was not the sole purpose of rich comparisons. One of the prime examples at the time was numpy (well, Numeric at the time). We wanted to use == to be able to return an array with boolean values where the two operand arrays were equal. E.g. In [1]: from numpy import * In [2]: array([1, 2, 3]) == array([4, 2, 3]) Out[2]: array([False, True, True], dtype=bool) SQLAlchemy uses these operators to build up objects that will be turned into SQL expressions. print users.c.id==addresses.c.user_id users.id = addresses.user_id Basically, the idea was to turn these operators into full-fledged operators like +-/*. Returning a non-boolean violates neither the letter, nor the spirit of the feature. Unfortunately, if you do overload __eq__ to build up expressions or whatnot, the other places where users of __eq__ are implicitly expecting a boolean break. While I was (and am) a supporter of rich comparisons, I feel Rasmus's pain from time to time. It would be nice to have an alternate method to express the boolean yes, this thing is equal in value to that other thing. Unfortunately, I haven't figured out a good way to fit it in now without sacrificing rich comparisons entirely. The best way, IMHO, would have been to use an alternative notation in numpy and SQLalchemy, and have '==' always return only a truth value - it could be a non-boolean as long as the bool() function gave the correct result. Surely the extra convenience of overloading '==' in special cases was not worth breaking such basic operations as 'bool(x == y)' or 'x in alist'. Again, the problem is only with '==', not with '', '=' etc. Of course it is done now, and unlikely to be reversed. and constrain the __eq__ function to always return a truth value. It is impossible to do that with certainty by any mechanical creation-time checking. So the implementation of operator.eq would have to check the return value of the ob.__eq__ function it calls *every time*. That would slow down the speed of the 99.xx% of cases where the check is not needed and would still not prevent exceptions. And if the return value was bad, all operator.eq could do is raise and exception anyway. Sure, but then it would be a bug to return a non-boolean from __eq__ and friends. It is not a bug today. I think that's what Rasmus is proposing. Yes, that is the point. If __eq__ functions are *supposed* to return booleans I can write generic code that will work for well-behaved objects, and any errors will be somebody elses fault. If __eq__ is free to return anything, or throw an error, it becomes my responsibility to write generic code that will work anyway, including with floating point numbers, numpy, or SQLalchemy. And I cannot see any way to do that (suggestions welcome). If purportedly general code does not work with numpy, your average numpy user will not be receptive to the idea that it is all numpys fault. Current behaviour is both inconsistent and counterintuitive, as these examples show. x = float('NaN') x == x False ll = [x] x in ll True x == ll[0] False import numpy y = numpy.zeros((3,)) y array([ 0., 0., 0.]) bool(y==y) Traceback (most recent call last): File stdin, line 1, in module ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all() ll1 = [y,1] y in ll1 True ll2 = [1,y] y in ll2 Traceback (most recent call last): File stdin, line 1, in module ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all() Can anybody see a way this could be fixed (please)? I may well have to live with it, but I would really prefer not to. --- Dr. Rasmus H. Fogh Email: [EMAIL PROTECTED] Dept. of Biochemistry, University of Cambridge, 80 Tennis Court Road, Cambridge CB2 1GA, UK. FAX (01223)766002 -- http://mail.python.org/mailman/listinfo/python-list
Re: Rich Comparisons Gotcha
Jamed Stroud Wrote: Rasmus Fogh wrote: Dear All, For the first time I have come across a Python feature that seems completely wrong. After the introduction of rich comparisons, equality comparison does not have to return a truth value, and may indeed return nothing at all and throw an error instead. As a result, code like if foo == bar: or foo in alist cannot be relied on to work. This is clearly no accident. According to the documentation all comparison operators are allowed to return non-booleans, or to throw errors. There is explicitly no guarantee that x == x is True. I'm not a computer scientist, so my language and perspective on the topic may be a bit naive, but I'll try to demonstrate my caveman understanding example. First, here is why the ability to throw an error is a feature: class Apple(object): def __init__(self, appleness): self.appleness = appleness def __cmp__(self, other): assert isinstance(other, Apple), 'must compare apples to apples' return cmp(self.appleness, other.appleness) class Orange(object): pass Apple(42) == Orange() True, but that does not hold for __eq__, only for __cmp__, and for__gt__, __le__, etc. Consider: Class Apple(object): def __init__(self, appleness): self.appleness = appleness def __gt__(self, other): assert isinstance(other, Apple), 'must compare apples to apples' return (self.appleness other.appleness) def __eq__(self, other): if isinstance(other, Apple): return (self.appleness == other.appleness) else: return False Second, consider that any value in python also evaluates to a truth value in boolean context. Third, every function returns something. A function's returning nothing is not a possibility in the python language. None is something but evaluates to False in boolean context. Indeed. The requirement would be not that return_value was a boolean, but that bool(return_value) was defined and gave the correct result. I understand that in some old Numeric/numpy version the numpy array __eq__ function returned a non-empty array, so that bool(numarray1 == numarray2) was true for any pair of arguments, which is one way of breaking '=='. In current numpy, even bool(numarray1 == 1) throws an error, which is another way of breaking '=='. But surely you can define an equal/unequal classification for all types of object, if you want to? This reminds me of complex numbers: would 4 + 4i be equal to sqrt(32)? Even in the realm of pure mathematics, the generality of objects (i.e. numbers) can not be assumed. It sounds like that problem is simpler in computing. sqrt(32) evaluates to 5.6568542494923806 on my computer. A complex number c with non-zero imaginary part would be unequal to sqrt(32) even if it so happened that c*c==32. Yours, Rasmus --- Dr. Rasmus H. Fogh Email: [EMAIL PROTECTED] Dept. of Biochemistry, University of Cambridge, 80 Tennis Court Road, Cambridge CB2 1GA, UK. FAX (01223)766002 -- http://mail.python.org/mailman/listinfo/python-list
Re: Rich Comparisons Gotcha
On Sun, 07 Dec 2008 13:03:43 +, Rasmus Fogh wrote: Jamed Stroud Wrote: ... Second, consider that any value in python also evaluates to a truth value in boolean context. But bool(x) can fail too. So not every object in Python can be interpreted as a truth value. Third, every function returns something. Unless it doesn't return at all. A function's returning nothing is not a possibility in the python language. None is something but evaluates to False in boolean context. Indeed. The requirement would be not that return_value was a boolean, but that bool(return_value) was defined and gave the correct result. If __bool__ or __nonzero__ raises an exception, you would like Python to ignore the exception and return True or False. Which should it be? How do you know what the correct result should be? From the Zen of Python: In the face of ambiguity, refuse the temptation to guess. All binary operators are ambiguous when dealing with vector or array operands. Should the operator operate on the array as a whole, or on each element? The numpy people have decided that element-wise equality testing is more useful for them, and this is their prerogative to do so. In fact, the move to rich comparisons was driven by the needs of numpy. http://www.python.org/dev/peps/pep-0207/ It is a *VERY* important third-party library, and this was not the first and probably won't be the last time that their needs will move into Python the language. Python encourages such domain-specific behaviour. In fact, that's what operator-overloading is all about: classes can define what any operator means for *them*. There's no requirement that the infinity of potential classes must all define operators in a mutually compatible fashion, not even for comparison operators. For example, consider a class implementing one particular version of three-value logic. It isn't enough for == to only return True or False, because you also need Maybe: True == False = returns False True == True = returns True True == Maybe = returns Maybe etc. Or consider fuzzy logic, where instead of two truth values, you have a continuum of truth values between 0.0 and 1.0. What should comparing two such fuzzy values for equality return? A boolean True/False? Another fuzzy value? Another one from the Zen: Special cases aren't special enough to break the rules. The rules are that classes can customize their behaviour, that methods can fail, and that Python should not try to guess what the correct value should have been in the event of such a failure. Equality is a special case, but it isn't so special that it needs to be an exception from those rules. If you really need a guaranteed-can't-fail[1] equality test, try something like this untested wrapper class: class EqualityWrapper(object): def __init__(self, obj): self.wrapped = obj def __eq__(self, other): try: return bool(self.wrapped == other) except Exception: return False # or maybe True? Now wrap all your data: data = [a list of arbitrary objects] data = map(EqualityWrapper, data) process(data) [1] Not a guarantee. Well, lots to think about. Just to keep you from shooting at straw men: I would have liked it to be part of the design contract (a convention, if you like) that 1) bool(x == y) should return a boolean and never throw an error 2) x == x return True I do *not* say that bool(x) should never throw an error. I do *not* say that Python should guess a return value if an __eq__ function throws an error, only that it should have been considered a bug, or at least bad form, for __eq__ functions to do so. What might be a sensible behaviour (unlike your proposed wrapper) would be the following: def eq(x, y): if x is y: return True else: try: return (x == y) except Exception: return False If is is possible to change the language, how about having two diferent functions, one for overloading the '==' operator, and another for testing list and set membership, dictionary key identity, etc.? For instance like this - Add a new function __equals__; x.__equals__(y) could default to bool(x.__eq__(y)) - Estalish by convention that x.__equals__(y) must return a boolean and may not intentionally throw an error. - Establish by convention that 'x is y' implies 'x.__equals__(y)' in the sense that (not (x is y and not x.__equals__(y)) must always hold - Have the Python data structures call __equals__ when they want to compare objects internally (e.g. for 'x in alist', 'x in adict', 'set(alist)', etc. - Provide an equals(x,y) built-in that calls the __equals__ function - numpy and others who (mis)use '==' for their own purposes could use def __equals__(self, other): return (self is other) For the float NaN case it looks like things are already behaving like this. For numpy objects you would not lose anything, since 'numpyArray in alist' is broken
Rich Comparisons Gotcha
Dear All, For the first time I have come across a Python feature that seems completely wrong. After the introduction of rich comparisons, equality comparison does not have to return a truth value, and may indeed return nothing at all and throw an error instead. As a result, code like if foo == bar: or foo in alist cannot be relied on to work. This is clearly no accident. According to the documentation all comparison operators are allowed to return non-booleans, or to throw errors. There is explicitly no guarantee that x == x is True. Personally I would like to get these [EMAIL PROTECTED]* misfeatures removed, and constrain the __eq__ function to always return a truth value. That is clearly not likely to happen. Unless I have misunderstood something, could somebody explain to me 1) Why was this introduced? I can understand relaxing the restrictions on '', '=' etc. - after all you cannot define an ordering for all types of object. But surely you can define an equal/unequal classification for all types of object, if you want to? Is it just the numpy people wanting to type 'a == b' instead of 'equals(a,b)', or is there a better reason? 2) If I want to write generic code, can I somehow work around the fact that if foo == bar: or foo in alist does not work for arbitrary objects? Yours, Rasmus Some details: CCPN has a table display class that maintains a list of arbitrary objects, one per line in the table. The table class is completely generic, and subclassed for individual cases. It contains the code: if foo in tbllist: ... else: ... tbllist.append(foo) ... One day the 'if' statement gave this rather obscure error: ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all() A subclass had used objects passed in from some third party code, and as it turned out foo happened to be a tuple containing a tuple containing a numpy array. Some more precise tests gave the following: # Python 2.5.2 (r252:60911, Jul 31 2008, 17:31:22) # [GCC 4.2.3 (Ubuntu 4.2.3-2ubuntu7)] on linux2 # set up import numpy a = float('NaN') b = float('NaN') ll = [a,b] c = numpy.zeros((2,3)) d = numpy.zeros((2,3)) mm = [c,d] # try NaN print (a == a)# gives False print (a is a)# gives True print (a == b)# gives False print (a is b)# gives False print (a in ll) # gives True print (b in ll) # gives True print (ll.index(a)) # gives 0 print (ll.index(b)) # gives 1 # try numpy array print (c is c) # gives True print (c is d) # gives False print (c in mm) # gives True print (mm.index(c)) # 0 print (c == c) # gives [[ True True True][ True True True]] print (c == d) # gives [[ True True True][ True True True]] print (bool(1 == c)) # raises error - see below print (d in mm) # raises error - see below print (mm.index(d)) # raises error - see below print (c in ll) # raises error - see below print (ll.index(c)) # raises error - see below The error was the same in each case: ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all() --- Dr. Rasmus H. Fogh Email: [EMAIL PROTECTED] Dept. of Biochemistry, University of Cambridge, 80 Tennis Court Road, Cambridge CB2 1GA, UK. FAX (01223)766002 -- http://mail.python.org/mailman/listinfo/python-list