Re: Rich Comparisons Gotcha
On Wed, 07 Jan 2009 01:23:19 +, Mark Wooding wrote: A case-sensitive string is /not the same/ as a case-insensitive string. One's a duck, the other's a goose. I'd claim here that iabc =~ ABC must be False, because iabc =~ abc must be false also! To define it otherwise leads to the incoherence you describe. It's only incoherent if you need equality to be an equivalence relation. If you don't, it is perfectly reasonable to declare that iabc equals abc. -- Steven -- http://mail.python.org/mailman/listinfo/python-list
Re: Rich Comparisons Gotcha
Steven D'Aprano ste...@remove.this.cybersource.com.au wrote: It's only incoherent if you need equality to be an equivalence relation. If you don't, it is perfectly reasonable to declare that iabc equals abc. Right! And if you didn't want an equivalence relation, then `==' will suit you fine. The problem is that some applications seem to /want/ an equivalence relation, and one that's more useful (i.e., less discriminating) than `is'. -- [mdw] -- http://mail.python.org/mailman/listinfo/python-list
Re: Rich Comparisons Gotcha
Steven D'Aprano ste...@remove.this.cybersource.com.au wrote: Such assumptions only hold under particular domains though. You can't assume equality is an equivalence relation once you start thinking about arbitrary domains. From a formal mathematical point of view, equality /is/ an equivalence relation. If you have a relation on some domain, and it's not an equivalence relation, then it ain't the equality relation, and that's flat. But there cannot be any such function which is a domain-independent equivalence relation, not if we're talking about arbitrarily wacky domains. That looks like a claim which requires a proof to me. But it could also do with a definition of `domain', so I'll settle for one of those first. If we're dealing with sets (i.e., `domain's form a subclass of `sets') then the claim is clearly false, and equality (determined by comparison of elements) is indeed a domain-independent equivalence relation. Even something as straight-forward as is can't be an equivalence relation under a domain where identity isn't well-defined. You've completely lost me here. The Python `is' operator is (the characteristic function of) an equivalence relation on Python values: that's its definition. You could describe an extension of the `is' relation to a larger set of items, such that it fails to be an equivalence relation on that set, but you'd be (rightly) criticized for failing to preserve one of its two defining properties. (The other is that `is' makes distinctions between values which are at least as fine as any other method, and this property should also be extended .) Let me have another go. All Python objects are instances of `object' or of some more specific class. The `==' operator on `object' is (the characteristic function of) an equivalence relation. In, fact, it's the same as `is' -- but `==' can be overridden by subclasses, and subclasses are permitted -- according to the interface definition -- to coarsen the relation. In fact, they're permitted to make it not be an equivalence class at all. I claim that this is a problem. I /agree/ that domain-specific predicates are useful, and can be sufficiently useful that they deserve the `==' name -- as well as floats and numpy, I've provided SAGE and sympy as examples myself. But I also believe that there are good reasons to want an `equivalence' operator (I'll write it as `=~', though I don't propose this as Python syntax -- see below) with the following properties: * `=~' is the characteristic function[1] of an equivalence relation, i.e., for all values x, y, z: x =~ y in (True, False); (x =~ x) == True; if x =~ y then y =~ x; and if x =~ y and y =~ z then x =~ z * Moreover, `=~' is a coarsening of `is', i.e. for all values x, y: if x is y then x =~ y. A valuable property might be that x =~ y if x and y are indistinguishable without using `is'. That would mean immediately that 'xyz' =~ 'xy' + 'z' (regardless of interning, because strings are immutable). But for tuples this would imply elementwise comparison, which may be expensive -- and, in the case of tuples manufactured by C extensions, nontrivial because manufactured tuples need not be acyclic. On the other hand, `==' is already recursive on tuples. We can envisage a collection of different relations, according to which distinguishing methods we're willing to disallow. For example, for numerical types, there are actually a number of interesting relations, according to whether you think the answers to the following questions are true or false. * Is 1 =~ 1/1? (Here, 1 is an integer, and 1/1 is a rational number; both are the multiplicative identities of their respective rings. I'd suggest that it doesn't seem very useful to say `no' here, but there might be reasons why one would want type(x) is type(y) if x =~ y.) * Is 1 =~ 1.0? (This is trickier. Numerically the values are equal; but the former is exact and the latter inexact, and this is a good reason to want a separation.) Essentially, these are asking whether `type' is a legitimate distinguisher, and I think that the answer, unhelpful as it may be, is `sometimes'. A third useful distinguishing technique is mutation. Given two singleton lists whose respective elements compare equivalent, I can mutate one of them to decide whether the other is in fact the same. Is this something which `=~' should distinguish? Again, the answer is probably `sometimes'. To summarize: we're left with at least three different characteristics which an equivalence predicate might have: * efficient (e.g., bounded recursion depth, works on circular values); * neglects irrelevant (to whom?) differences of type; and * neglects differences due to mutability. A predicate used to compare set elements or hash-table keys should probably /respect/ mutability. (Associating hashing with this predicate, rather than `==', would coherently allow mutable objects such as lists to be
Re: Rich Comparisons Gotcha
On Tue, 06 Jan 2009 12:42:13 +, Mark Wooding wrote: Steven D'Aprano ste...@remove.this.cybersource.com.au wrote: Such assumptions only hold under particular domains though. You can't assume equality is an equivalence relation once you start thinking about arbitrary domains. From a formal mathematical point of view, equality /is/ an equivalence relation. If you have a relation on some domain, and it's not an equivalence relation, then it ain't the equality relation, and that's flat. Okay, fair enough. In the formal mathematical sense, equality is always an equivalence relation. So there are certain domains which don't have equality, e.g. floating point, since nan != nan. Also Python objects, since x.__eq__(y) is not necessarily the same as y.__eq__(x). But there cannot be any such function which is a domain-independent equivalence relation, not if we're talking about arbitrarily wacky domains. That looks like a claim which requires a proof to me. But it could also do with a definition of `domain', so I'll settle for one of those first. I'm talking about domain in the sense of a particular problem domain. That is, the model, data and operations used to solve a problem. I don't know that I can be more formal than that. To prove my claim, all you need is two domains with a mutually incompatible definition of equality. That's not so difficult, surely? How about equality of integers, versus equality of integers modulo some N? If we're dealing with sets (i.e., `domain's form a subclass of `sets') then the claim is clearly false, and equality (determined by comparison of elements) is indeed a domain-independent equivalence relation. It isn't domain-independent in my sense, because you have specified one specific domain, namely set equality. Even something as straight-forward as is can't be an equivalence relation under a domain where identity isn't well-defined. You've completely lost me here. The Python `is' operator is (the characteristic function of) an equivalence relation on Python values: that's its definition. Yes, that's because identity is well-defined in Python. I'm saying that if identity isn't well-defined, then neither is the 'is' operator, and therefore it isn't an equivalence relation. That shouldn't be controversial. All Python objects are instances of `object' or of some more specific class. The `==' operator on `object' is (the characteristic function of) an equivalence relation. In, fact, it's the same as `is' -- but `==' can be overridden by subclasses, and subclasses are permitted -- according to the interface definition -- to coarsen the relation. In fact, they're permitted to make it not be an equivalence class at all. I claim that this is a problem. It *can* be a problem, if you insist on using == on arbitrary types while still expecting it to be an equivalence relation. If you drop the requirement that it remain an e-r, then you can apply == to arbitrary types. And if you limit yourself to non-arbitrary types, then you can safely use (say) any strings you like, and == will remain an e-r. I /agree/ that domain-specific predicates are useful, and can be sufficiently useful that they deserve the `==' name -- as well as floats and numpy, I've provided SAGE and sympy as examples myself. But I also believe that there are good reasons to want an `equivalence' operator (I'll write it as `=~', though I don't propose this as Python syntax -- see below) with the following properties: * `=~' is the characteristic function[1] of an equivalence relation, i.e., for all values x, y, z: x =~ y in (True, False); (x =~ x) == True; if x =~ y then y =~ x; and if x =~ y and y =~ z then x =~ z * Moreover, `=~' is a coarsening of `is', i.e. for all values x, y: if x is y then x =~ y. Ah, but you can't have such a generic e-r that applies across all problem domains. Consider: Let's denote regular, case-sensitive strings using abc, and special, case-insensitive strings using iabc. So for regular strings, equality is an e-r; for case-insensitive strings, equality is also an e-r (I trust that the truth of this is obvious). But if you try to use equality on *both* regular and case-insensitive strings, it fails to be an e-r: iabc =~ ABC returns True if you use the case-insensitive definition of equality, but returns False if you use the case-sensitive definition. There is no single definition of equality that is *simultaneously* case- sensitive and case-insensitive. A valuable property might be that x =~ y if x and y are indistinguishable without using `is'. That's a little strong, because it implies that equality must look at *everything* about a particular object, not just whatever bits of data are relevant for the problem domain. For example, consider storing data in a dict. D1 = {-1: 0, -2: 0} D2 = {-2: 0} D2[-1] = 0 D1 == D2 True We certainly want D1 and D2 to be
Re: Rich Comparisons Gotcha
Steven D'Aprano ste...@remove.this.cybersource.com.au wrote: To prove my claim, all you need is two domains with a mutually incompatible definition of equality. That's not so difficult, surely? How about equality of integers, versus equality of integers modulo some N? No, that's not an example. The integers modulo N form a ring Z/NZ of residue classes. Such residue classes are distinct from the integers -- e.g., an integer 3 (say) is not the same as the set 3 + NZ { ..., 3 - 2N, 3 - N, 3, 3 + N, 3 + 2N, ... } -- but there is a homomorphism from Z to Z/NZ under which 3 + NZ is the image of 3. If we decide to define the == operator such that 3 == 3 + NZ and 3 + N == 3 + NZ then == is not an equivalence relation (in particular, transitivity fails). But that's just an artifact of the definition. If we distinguish 3 from 3 + NZ then everything is fine. 3 + NZ == (3 + N) + NZ correctly, but 3 != 3 + N, and all is well. Here, at least, the problem is not that == as an equivalence relation fails in some particular domain -- because in both Z and Z/NZ it can be a perfectly fine equivalence relation -- but that it can potentially fail on the boundaries between domains. Easy answer: don't mess it up at the boundaries. Proposition. Let U, U' be disjoint sets, and let E, E' be equivalence relations on U, U' respectively. Define E^ on U union U' as E^ = E union E', i.e., E^(x, y) iff x in U and y in U and E(x, y) or x in U' and y in U' and E'(x, y) Then E^ is an equivalence relation. Proof. Reflexivity and symmetry are trivial; transitivity follows from disjointness of U and U'. It *can* be a problem, if you insist on using == on arbitrary types while still expecting it to be an equivalence relation. Unfortunately, from the surrounding discussion, it seems that container types particularly want to be able to contain arbitrary objects, and the failure of == to be a equivalence relation makes this fail. The problem is that objects with wacky == operators are still more or less quacking like the more usual kinds of ducks; but they turn out to taste very different. Let's denote regular, case-sensitive strings using abc, and special, case-insensitive strings using iabc. So for regular strings, equality is an e-r; for case-insensitive strings, equality is also an e-r (I trust that the truth of this is obvious). But if you try to use equality on *both* regular and case-insensitive strings, it fails to be an e-r: iabc =~ ABC returns True if you use the case-insensitive definition of equality, but returns False if you use the case-sensitive definition. There is no single definition of equality that is *simultaneously* case- sensitive and case-insensitive. A case-sensitive string is /not the same/ as a case-insensitive string. One's a duck, the other's a goose. I'd claim here that iabc =~ ABC must be False, because iabc =~ abc must be false also! To define it otherwise leads to the incoherence you describe. But the above proposition provides an easy answer. A valuable property might be that x =~ y if x and y are indistinguishable without using `is'. That's a little strong, because it implies that equality must look at *everything* about a particular object, not just whatever bits of data are relevant for the problem domain. Yes. That's one of the reasons that =~ isn't the same as ==. I've been thinking on my feet in this thread, so I haven't thought everything through. And as I mention below, there are /many/ useful equality predicates on values. As I didn't mention (but hope is obvious) having a massively-parametrized equality predicate is daft, and providing enough to suit every possible application equally so. But we might be able to do well enough with just one or two -- or maybe by just leaving things as they are. For example, consider storing data in a dict. D1 = {-1: 0, -2: 0} D2 = {-2: 0} D2[-1] = 0 D1 == D2 True We certainly want D1 and D2 to be equal. Do we? If we're using my `indistinguishable without using ``is''' criterion from above, then D1 and D2 are certainly different! To detect the difference, mutate one and see if the other changes: def distinct_dictionaries_p(D1, D2): Decide whether D1 and D2 are the same dictionary or not. Not threadsafe. magic = [] more_magic = [magic] old = D1.get('mumble', more_magic) D1['mumble'] = magic result = D2.get('mumble', more_magic) is magic if old is more_magic: del D1['mumble'] else: D1['mumble'] = old return result But that criterion was a suggestion -- a way of defining a coherent equivalence relation on the whole of the Python value space which is coarser than `is' and maybe more useful. My primary purpose in proposing it was to stimulate discussion: what /do/ we want from equality predicates? We already have `is', which is too fine-grained to be widely useful: it distinguishes between different instances of the number 50, for
Re: Rich Comparisons Gotcha
Steven D'Aprano st...@remove-this-cybersource.com.au wrote: There is nothing to blame them for. This is the correct behaviour. NaNs should *not* compare equal to themselves, that's mathematically incoherent. Indeed. The problem is a paucity of equality predicates. This is hardly surprising: Common Lisp has four general-purpose equality predicates (EQ, EQL, EQUAL and EQUALP), and many more type-specific ones (=, STRING=, STRING-EQUAL (yes, I know...), CHAR=, ...), and still doesn't really have enough. For example, EQUAL compares strings case-sensitively, but other arrays are compared by address; EQUALP will recurse into arbitrary arrays, but compares strings case-insensitively... For the purposes of this discussion, however, it has enough to be able to distinguish between * numerical comparisons, which (as you explain later) should /not/ claim that two NaNs are equal, and * object comparisons, which clearly must declare an object equal to itself. For example, I had the following edifying conversation with SBCL. CL-USER ;; Return NaNs rather than signalling errors. (sb-int:set-floating-point-modes :traps nil) ; No value CL-USER (defconstant nan (/ 0.0 0.0)) NAN CL-USER (loop for func in '(eql equal equalp =) collect (list func (funcall func nan nan))) ((EQL T) (EQUAL T) (EQUALP T) (= NIL)) CL-USER That is, a NaN is EQL, EQUAL and EQUALP to itself, but not = to itself. (Due to the vagaries of EQ, a NaN might or might not be EQ to itself or other NaNs.) Python has a much more limited selection of equality predicates -- in fact, just == and is. The is operator is Python's equivalent of Lisp's EQ predicate: it compares objects by address. I can have a similar chat with Python. In [12]: nan = float('nan') In [13]: nan is nan Out[13]: True In [14]: nan == nan Out[14]: False In [16]: nan is float('nan') Out[16]: False Python numbers are the same as themselves reliably, unlike in Lisp. But there's no sensible way of asking whether something is `basically the same as' nan, like Lisp's EQL or EQUAL. I agree that the primary equality predicate for numbers must be the numerical comparison, and NaNs can't (sensibly) be numerically equal to themselves. Address comparisons are great when you're dealing with singletons, or when you carefully intern your objects. In other cases, you're left with ==. This puts a great deal of responsibility on the programmer of an == method to weigh carefully the potentially conflicting demands of compatibility (many other libraries just expect == to be an equality operator returning a straightforward truth value, and given that there isn't a separate dedicated equality operator, this isn't unreasonable), and doing something more domain-specifically useful. It's worth pointing out that numpy isn't unique in having == not return a straightforward truth value. The SAGE computer algebra system (and sympy, I believe) implement the == operator on algebraic formulae so as to construct equations. For example, the following is syntactically and semantically Python, with fancy libraries. sage: var('x') # x is now a variable x sage: solve(x**2 + 2*x - 4 == 1) [x == -sqrt(6) - 1, x == sqrt(6) - 1] (SAGE has some syntactic tweaks, such as ^ meaning the same as **, but I didn't use them.) I think this is an excellent use of the == operator -- but it does have some potential to interfere with other libraries which make assumptions about how == behaves. The SAGE developers have been clever here, though: sage: 2*x + 1 == (2 + 4*x)/2 2*x + 1 == (4*x + 2)/2 sage: bool(2*x + 1 == (2 + 4*x)/2) True sage: bool(2*x + 1 == (2 + 4*x)/3) False I think Python manages surprisingly well with its limited equality predicates. But the keyword there is `surprisingly' -- and it may not continue this trick forever. -- [mdw] -- http://mail.python.org/mailman/listinfo/python-list
Re: Rich Comparisons Gotcha
Steven D'Aprano ste...@remove.this.cybersource.com.au wrote: I've already mentioned NaNs. Sentinel values also sometimes need to compare not equal with themselves. Forcing them to compare equal will cause breakage. There's a conflict between such domain-specific considerations (NaNs, strange sentinels, SAGE's equations), and relatively natural assumptions about an == operator, such as it being an equivalence relation. I don't know how to resolve this conflict without introducing a new function which is (or at least strongly encourages developers to arrange for it to be) an equivalence relation. -- [mdw] -- http://mail.python.org/mailman/listinfo/python-list
Re: Rich Comparisons Gotcha
On Tue, 06 Jan 2009 01:24:58 +, Mark Wooding wrote: Steven D'Aprano ste...@remove.this.cybersource.com.au wrote: I've already mentioned NaNs. Sentinel values also sometimes need to compare not equal with themselves. Forcing them to compare equal will cause breakage. There's a conflict between such domain-specific considerations (NaNs, strange sentinels, SAGE's equations), and relatively natural assumptions about an == operator, such as it being an equivalence relation. Such assumptions only hold under particular domains though. You can't assume equality is an equivalence relation once you start thinking about arbitrary domains. I don't know how to resolve this conflict without introducing a new function which is (or at least strongly encourages developers to arrange for it to be) an equivalence relation. But there cannot be any such function which is a domain-independent equivalence relation, not if we're talking about arbitrarily wacky domains. Even something as straight-forward as is can't be an equivalence relation under a domain where identity isn't well-defined. -- Steven -- http://mail.python.org/mailman/listinfo/python-list
Re: Rich Comparisons Gotcha
On Wed, 10 Dec 2008 17:58:49 -0500, Luis Zarrabeitia wrote: On Sunday 07 December 2008 09:21:18 pm Robert Kern wrote: The deficiency is in the feature of rich comparisons, not numpy's implementation of it. __eq__() is allowed to return non-booleans; however, there are some parts of Python's implementation like list.__contains__() that still expect the return value of __eq__() to be meaningfully cast to a boolean. list.__contains__, tuple.__contains__, the 'if' keyword... How do can you suggest to fix the list.__contains__ implementation? I suggest you don't, because I don't think it's broken. I think it's working as designed. It doesn't succeed with arbitrary data types which may be broken, buggy or incompatible with __contain__'s design, but that's okay, it's not supposed to. Should I wrap all my ifs with this?: if isinstance(a, numpy.array) or isisntance(b,numpy.array): res = compare_numpy(a,b) elif isinstance(a,some_otherclass) or isinstance(b,someotherclass): res = compare_someotherclass(a,b) ... else: res = (a == b) if res: # do whatever No, inlining that code everywhere you have an if would be stupid. What you should do is write a single function equals(x, y) that does precisely what you want it to do, in whatever way you want, and then call it: if equals(a, b): Or, put your data inside a wrapper. If you read back over my earlier posts in this thread, I suggested a lightweight wrapper class you could use. You could make it even more useful by using delegation to make the wrapped class behave *exactly* like the original, except for __eq__. You don't even need to wrap every single item: def wrap_or_not(obj): if obj in list_of_bad_types_i_know_about: return EqualityWrapper(obj) return obj data = [1, 2, 3, BadData, 4] data = map(wrap_or_not, data) It isn't really that hard to deal with these things, once you give up the illusion that your code should automatically work with arbitrarily wacky data types that you don't control. -- Steven -- http://mail.python.org/mailman/listinfo/python-list
Re: Rich Comparisons Gotcha
On 2008-12-10 23:21, Luis Zarrabeitia wrote: On Wednesday 10 December 2008 02:44:45 pm you wrote: Even in statically typed languages, when you override the equality operator/function you can choose not to return a valid answer (raise an exception). And it would break all the cases mentioned above (element in list, etc). But that isn't the right thing to do. The language doesn't/can't prohibit you from breaking the equality test, but that shouldn't be considered a feature. (a==b).all() makes no sense. Perhaps not in your application, but it does make sense in other numeric applications, e.g. ones that work on vectors or matrixes. I'd suggest you simply wrap the comparison in a function and then have that apply the necessary conversion to a boolean. I do numeric work... I'm finishing my MSc in applied math and I'm programing mostly with python. And I'd rather have a.compare_with(b), or a.elementwise_compare(b), or whatever name, rather than (a==b).all(). In fact, I'd very much like to have an a.compare_with(b, epsilon=e).all() (to account for rounding errors), and with python2.5, all(a.compare_with(b)). Yes, I could create an element_compare(a,b) function. But I still can't use a==b and have a meaningful result. Ok, I can (and do) ignore that, it's just one library, I'll keep track of the types before asking for equality (already an ugly thing to do in python), but the a==b behaviour breaks the lists (a in ll, ll.indexof(a)) even for elements not in numpy. ¿Should I also ignore lists? You should perhaps reconsider your use of lists. Lists with elements of different types can be tricky at times, so perhaps you either need a different data type which doesn't scan all elements or a separate search function that knows about your type setup. The fact that comparisons can raise exceptions is not new to Python, so this problem can pop up in other areas as well, esp. when using 3rd party extensions. Regarding the other issues like new methods you should really talk to the numpy developers, since they are the ones who could fix this. The concept of equality between two arrays is very well defined, as it is also very well defined the element-by-element comparison. There is a need to test for both - then the way to test for equality should be the equality test. I'm certain that something could be worked out. A quick paragraph that took me just a few minutes to type shouldn't be construed as a PEP that will solve all the problems :D. As always: the Devil is in the details :-) Of course... -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source (#1, Dec 11 2008) Python/Zope Consulting and Support ...http://www.egenix.com/ mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/ mxODBC, mxDateTime, mxTextTools ...http://python.egenix.com/ 2008-12-02: Released mxODBC.Connect 1.0.0 http://python.egenix.com/ ::: Try our new mxODBC.Connect Python Database Interface for free ! eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 http://www.egenix.com/company/contact/ -- http://mail.python.org/mailman/listinfo/python-list
Re: Rich Comparisons Gotcha
Rhamphoryncus wrote: You grossly overvalue using the in operator on lists. Maybe. But there is more to it than just 'in'. If you do: c = numpy.zeros((2,)) ll = [1, c, 3.] then the following all throw errors: 3 in ll, 3 not in ll, ll.index(3), ll.count(3), ll.remove(3) c in ll, c not in ll, ll.index(c), ll.count(c), ll.remove(c) Note how the presence of c in the list makes it behave wrong for 3 as well. It's far more common to use a dict or set for containment tests, due to O(1) performance rather than O(n). I doubt the numpy array supports hashing, so an error for misuse is all you should expect. Indeed it doees not. So there is not much to be gained from modifying equality comparison with sets/dicts. In the rare case that you want to test for identity in a list, you can easily write your own function to do it upfront: def idcontains(seq, obj): for i in seq: if i is obj: return True return False Again, you can code around any particular case (though wrappers look like a more robust solution). Still, why not get rid of this wart, if we can find a way? --- Dr. Rasmus H. Fogh Email: [EMAIL PROTECTED] Dept. of Biochemistry, University of Cambridge, 80 Tennis Court Road, Cambridge CB2 1GA, UK. FAX (01223)766002 -- http://mail.python.org/mailman/listinfo/python-list
Re: Rich Comparisons Gotcha
Rhodri James wrote: On Mon, 08 Dec 2008 14:24:59 -, Rasmus Fogh wrote: On the minus side there would be the difference between '__equal__' and '__eq__' to confuse people. This is a very big minus. It would be far better to spell __equal__ in such a way as to make it clear why it wasn't the same as __eq__, otherwise you end up with the confusion that the Perl == and eq operators regularly cause. You are probably right, unfortunately. That proposal is unlikely to fly. Do you think my latest proposal, raising BoolNotDefinedError, has better chances? --- Dr. Rasmus H. Fogh Email: [EMAIL PROTECTED] Dept. of Biochemistry, University of Cambridge, 80 Tennis Court Road, Cambridge CB2 1GA, UK. FAX (01223)766002 -- http://mail.python.org/mailman/listinfo/python-list
Re: Rich Comparisons Gotcha
Quoting Rasmus Fogh [EMAIL PROTECTED]: Rhamphoryncus wrote: You grossly overvalue using the in operator on lists. Maybe. But there is more to it than just 'in'. If you do: c = numpy.zeros((2,)) ll = [1, c, 3.] then the following all throw errors: 3 in ll, 3 not in ll, ll.index(3), ll.count(3), ll.remove(3) c in ll, c not in ll, ll.index(c), ll.count(c), ll.remove(c) Note how the presence of c in the list makes it behave wrong for 3 as well. I think I lost the first messages on this thread, but... Wouldn't be easier to just fix numpy? I see no need to have the == return anything but a boolean, at least on Numpy's case. The syntax 'a == b' is an equality test, not a detailed summary of why they may be different, and (a==b).all() makes no little sense to read unless you know beforehad that a and b are numpy arrays. When I'm comparing normal objects, I do not expect (nor should I) the == operator to return an attribute-by-attribute summary of what was equal and what wasn't. Why is numpy's == overloaded in such a counter intuitive way? I realize that an elementwise comparison makes a lot of sense, but it could have been done instead with a.compare_with(b) (or even better, a.compare_with(b, epsilon=e)). No unexpected breakage, and you have the chance of specifying when you consider two elements to be equal - very useful. Even the transition itself could be done without breaking much code... Make the == op return an object that wraps the array of bools (instead of the array itself), give it the any() and all() methods, and make __nonzero__/__bool__ equivalent to all(). -- Luis Zarrabeitia Facultad de Matemática y Computación, UH http://profesores.matcom.uh.cu/~kyrie -- http://mail.python.org/mailman/listinfo/python-list
Re: Rich Comparisons Gotcha
On 2008-12-10 16:40, Luis Zarrabeitia wrote: Quoting Rasmus Fogh [EMAIL PROTECTED]: Rhamphoryncus wrote: You grossly overvalue using the in operator on lists. Maybe. But there is more to it than just 'in'. If you do: c = numpy.zeros((2,)) ll = [1, c, 3.] then the following all throw errors: 3 in ll, 3 not in ll, ll.index(3), ll.count(3), ll.remove(3) c in ll, c not in ll, ll.index(c), ll.count(c), ll.remove(c) Note how the presence of c in the list makes it behave wrong for 3 as well. I think I lost the first messages on this thread, but... Wouldn't be easier to just fix numpy? I see no need to have the == return anything but a boolean, at least on Numpy's case. The syntax 'a == b' is an equality test, not a detailed summary of why they may be different, and (a==b).all() makes no little sense to read unless you know beforehad that a and b are numpy arrays. When I'm comparing normal objects, I do not expect (nor should I) the == operator to return an attribute-by-attribute summary of what was equal and what wasn't. Why is numpy's == overloaded in such a counter intuitive way? I realize that an elementwise comparison makes a lot of sense, but it could have been done instead with a.compare_with(b) (or even better, a.compare_with(b, epsilon=e)). No unexpected breakage, and you have the chance of specifying when you consider two elements to be equal - very useful. Rich comparisons were added to Python at the request of the Numeric (now numpy) developers and they have been part of Python a Numeric for many many years. I don't think it's likely they'll change things back to the days of Python 1.5.2 ;-) Even the transition itself could be done without breaking much code... Make the == op return an object that wraps the array of bools (instead of the array itself), give it the any() and all() methods, and make __nonzero__/__bool__ equivalent to all(). That would cause a lot of confusion on its own, since such an object wouldn't behave in the same way as say a regular Python list (bool([0]) == True). -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source (#1, Dec 10 2008) Python/Zope Consulting and Support ...http://www.egenix.com/ mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/ mxODBC, mxDateTime, mxTextTools ...http://python.egenix.com/ 2008-12-02: Released mxODBC.Connect 1.0.0 http://python.egenix.com/ ::: Try our new mxODBC.Connect Python Database Interface for free ! eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 http://www.egenix.com/company/contact/ -- http://mail.python.org/mailman/listinfo/python-list
Re: Rich Comparisons Gotcha
On Dec 10, 7:49 am, Rasmus Fogh [EMAIL PROTECTED] wrote: Rhamphoryncus wrote: You grossly overvalue using the in operator on lists. Maybe. But there is more to it than just 'in'. If you do: c = numpy.zeros((2,)) ll = [1, c, 3.] then the following all throw errors: 3 in ll, 3 not in ll, ll.index(3), ll.count(3), ll.remove(3) c in ll, c not in ll, ll.index(c), ll.count(c), ll.remove(c) Note how the presence of c in the list makes it behave wrong for 3 as well. All of these are O(n). Use a set or dict. What is your use case anyway? It's far more common to use a dict or set for containment tests, due to O(1) performance rather than O(n). I doubt the numpy array supports hashing, so an error for misuse is all you should expect. Indeed it doees not. So there is not much to be gained from modifying equality comparison with sets/dicts. In the rare case that you want to test for identity in a list, you can easily write your own function to do it upfront: def idcontains(seq, obj): for i in seq: if i is obj: return True return False Again, you can code around any particular case (though wrappers look like a more robust solution). Still, why not get rid of this wart, if we can find a way? The wart is a feature. I agree that it's confusing, but the cost of adding a special case to work around it is far in excess of the original problem. Now if you phrased it as a hypothetical discussion for the purpose of learning about language design, that'd be another matter. -- http://mail.python.org/mailman/listinfo/python-list
Re: Rich Comparisons Gotcha
Rasmus Fogh wrote: Rhamphoryncus wrote: You grossly overvalue using the in operator on lists. Maybe. But there is more to it than just 'in'. If you do: c = numpy.zeros((2,)) ll = [1, c, 3.] then the following all throw errors: 3 in ll, 3 not in ll, ll.index(3), ll.count(3), ll.remove(3) c in ll, c not in ll, ll.index(c), ll.count(c), ll.remove(c) Note how the presence of c in the list makes it behave wrong for 3 as well. So do not put numpy arrays into lists without wrapping them. They were designed and semi-optimized, by a separate community, for a specific purpose -- numerical computation -- and not for 'playing nice' with other Python objects. It is a design feature of Python that people can implement specialized objects with specialized behaviors for specialized purposes. -- http://mail.python.org/mailman/listinfo/python-list
Re: Rich Comparisons Gotcha
On Wednesday 10 December 2008 10:50:57 am M.-A. Lemburg wrote: On 2008-12-10 16:40, Luis Zarrabeitia wrote: Quoting Rasmus Fogh [EMAIL PROTECTED]: Rhamphoryncus wrote: Rich comparisons were added to Python at the request of the Numeric (now numpy) developers and they have been part of Python a Numeric for many many years. I don't think it's likely they'll change things back to the days of Python 1.5.2 ;-) Please define rich comparisons for me. It seems that I do not understand the term - I was thinking it meant the ability to override the comparison operators, and specially, the ability to override them independently. Even in statically typed languages, when you override the equality operator/function you can choose not to return a valid answer (raise an exception). And it would break all the cases mentioned above (element in list, etc). But that isn't the right thing to do. The language doesn't/can't prohibit you from breaking the equality test, but that shouldn't be considered a feature. (a==b).all() makes no sense. Even the transition itself could be done without breaking much code... Make the == op return an object that wraps the array of bools (instead of the array itself), give it the any() and all() methods, and make __nonzero__/__bool__ equivalent to all(). That would cause a lot of confusion on its own, since such an object wouldn't behave in the same way as say a regular Python list (bool([0]) == True). I'm certain that something could be worked out. A quick paragraph that took me just a few minutes to type shouldn't be construed as a PEP that will solve all the problems :D. -- Luis Zarrabeitia (aka Kyrie) Fac. de Matemática y Computación, UH. http://profesores.matcom.uh.cu/~kyrie -- http://mail.python.org/mailman/listinfo/python-list
Re: Rich Comparisons Gotcha
On 2008-12-10 20:01, Luis Zarrabeitia wrote: On Wednesday 10 December 2008 10:50:57 am M.-A. Lemburg wrote: On 2008-12-10 16:40, Luis Zarrabeitia wrote: Quoting Rasmus Fogh [EMAIL PROTECTED]: Rhamphoryncus wrote: Rich comparisons were added to Python at the request of the Numeric (now numpy) developers and they have been part of Python a Numeric for many many years. I don't think it's likely they'll change things back to the days of Python 1.5.2 ;-) Please define rich comparisons for me. It seems that I do not understand the term - I was thinking it meant the ability to override the comparison operators, and specially, the ability to override them independently. That's one of the features, rich comparisons added. Another is the ability to return arbitrary objects instead of just booleans or integers: http://www.python.org/dev/peps/pep-0207/ David was a Numeric developer at the time (among other things). Even in statically typed languages, when you override the equality operator/function you can choose not to return a valid answer (raise an exception). And it would break all the cases mentioned above (element in list, etc). But that isn't the right thing to do. The language doesn't/can't prohibit you from breaking the equality test, but that shouldn't be considered a feature. (a==b).all() makes no sense. Perhaps not in your application, but it does make sense in other numeric applications, e.g. ones that work on vectors or matrixes. I'd suggest you simply wrap the comparison in a function and then have that apply the necessary conversion to a boolean. Even the transition itself could be done without breaking much code... Make the == op return an object that wraps the array of bools (instead of the array itself), give it the any() and all() methods, and make __nonzero__/__bool__ equivalent to all(). That would cause a lot of confusion on its own, since such an object wouldn't behave in the same way as say a regular Python list (bool([0]) == True). I'm certain that something could be worked out. A quick paragraph that took me just a few minutes to type shouldn't be construed as a PEP that will solve all the problems :D. As always: the Devil is in the details :-) -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source (#1, Dec 10 2008) Python/Zope Consulting and Support ...http://www.egenix.com/ mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/ mxODBC, mxDateTime, mxTextTools ...http://python.egenix.com/ 2008-12-02: Released mxODBC.Connect 1.0.0 http://python.egenix.com/ ::: Try our new mxODBC.Connect Python Database Interface for free ! eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 http://www.egenix.com/company/contact/ -- http://mail.python.org/mailman/listinfo/python-list
Re: Rich Comparisons Gotcha
On Wednesday 10 December 2008 02:44:45 pm you wrote: Even in statically typed languages, when you override the equality operator/function you can choose not to return a valid answer (raise an exception). And it would break all the cases mentioned above (element in list, etc). But that isn't the right thing to do. The language doesn't/can't prohibit you from breaking the equality test, but that shouldn't be considered a feature. (a==b).all() makes no sense. Perhaps not in your application, but it does make sense in other numeric applications, e.g. ones that work on vectors or matrixes. I'd suggest you simply wrap the comparison in a function and then have that apply the necessary conversion to a boolean. I do numeric work... I'm finishing my MSc in applied math and I'm programing mostly with python. And I'd rather have a.compare_with(b), or a.elementwise_compare(b), or whatever name, rather than (a==b).all(). In fact, I'd very much like to have an a.compare_with(b, epsilon=e).all() (to account for rounding errors), and with python2.5, all(a.compare_with(b)). Yes, I could create an element_compare(a,b) function. But I still can't use a==b and have a meaningful result. Ok, I can (and do) ignore that, it's just one library, I'll keep track of the types before asking for equality (already an ugly thing to do in python), but the a==b behaviour breaks the lists (a in ll, ll.indexof(a)) even for elements not in numpy. ¿Should I also ignore lists? The concept of equality between two arrays is very well defined, as it is also very well defined the element-by-element comparison. There is a need to test for both - then the way to test for equality should be the equality test. I'm certain that something could be worked out. A quick paragraph that took me just a few minutes to type shouldn't be construed as a PEP that will solve all the problems :D. As always: the Devil is in the details :-) Of course... -- Luis Zarrabeitia (aka Kyrie) Fac. de Matemática y Computación, UH. http://profesores.matcom.uh.cu/~kyrie -- http://mail.python.org/mailman/listinfo/python-list
Re: Rich Comparisons Gotcha
On Sunday 07 December 2008 09:21:18 pm Robert Kern wrote: The deficiency is in the feature of rich comparisons, not numpy's implementation of it. __eq__() is allowed to return non-booleans; however, there are some parts of Python's implementation like list.__contains__() that still expect the return value of __eq__() to be meaningfully cast to a boolean. list.__contains__, tuple.__contains__, the 'if' keyword... How do can you suggest to fix the list.__contains__ implementation? Should I wrap all my ifs with this?: if isinstance(a, numpy.array) or isisntance(b,numpy.array): res = compare_numpy(a,b) elif isinstance(a,some_otherclass) or isinstance(b,someotherclass): res = compare_someotherclass(a,b) ... else: res = (a == b) if res: # do whatever -- Luis Zarrabeitia (aka Kyrie) Fac. de Matemática y Computación, UH. http://profesores.matcom.uh.cu/~kyrie -- http://mail.python.org/mailman/listinfo/python-list
Re: Rich Comparisons Gotcha
On Wed, 10 Dec 2008 17:21:51 -0500, Luis Zarrabeitia wrote: I do numeric work... I'm finishing my MSc in applied math and I'm programing mostly with python. And I'd rather have a.compare_with(b), or a.elementwise_compare(b), or whatever name, rather than (a==b).all(). Unluckily for you, the Numeric/Numpy people wanted something else. They asked first, there's a lot more of them, and their project is very important to Python's continued success. In fact, I'd very much like to have an a.compare_with(b, epsilon=e).all() (to account for rounding errors), and with python2.5, all(a.compare_with(b)). Yes, I could create an element_compare(a,b) function. Absolutely. But I still can't use a==b and have a meaningful result. That's right. *ANY* operation in Python can fail, given arbitrary data, with the possible exception of the id() function and the is and is not operators. You have to deal with it. Ok, I can (and do) ignore that, it's just one library, I'll keep track of the types before asking for equality (already an ugly thing to do in python), but the a==b behaviour breaks the lists (a in ll, ll.indexof(a)) even for elements not in numpy. ¿Should I also ignore lists? That depends on what sort of contract your code is giving. Does it promise to work with any imaginable data whatsoever, no matter how badly broken or poorly designed or incompatible with what you're trying to do? If so, then I suggest your contract is broken, not the behaviour of list. You can't make trustworthy promises to deal with arbitrary data types that you don't control, that can fail in arbitrary ways. Here's something for you to consider: class Boobytrap: def __eq__(self, other): if other == 1: return True elif other == 2: while True: pass return False alist = [0, Boobytrap(), 2, 3] 1 in alist True 3 in alist True 5 in alist False 2 in alist What do you expect should happen? The concept of equality between two arrays is very well defined, as it is also very well defined the element-by-element comparison. There is a need to test for both - then the way to test for equality should be the equality test. The Numpy people disagree with you. It was from their request that Python was changed to allow __eq__ to return arbitrary objects. -- Steven -- http://mail.python.org/mailman/listinfo/python-list
Re: Rich Comparisons Gotcha
Steven DAprano wrote: On Mon, 08 Dec 2008 14:24:59 +, Rasmus Fogh wrote: For my personal problem I could indeed wrap all objects in a wrapper with whatever 'correct' behaviour I want (thanks, TJR). It does seem a bit much, though, just to get code like this to work as intended: alist.append(x) print ('x is present: ', x in alist) So, I would much prefer a language change. I am not competent to even propose one properly, but I'll try. You think changing the language is easier than applying a wrapper to your own data??? Oh my, that's too funny for words. Any individual case of the problem can be hacked somehow - I have already fixed this one. My point is that python would be a better language if well-written classes that followed normal python conventions could be relied on to work correctly with list, and that it is worth trying to bring this about. Lists are a central structure of the language after all. Of course you can disagree, or think the work required would be disproportionate, but surely there is nothing unreasonable about my point? Rasmus --- Dr. Rasmus H. Fogh Email: [EMAIL PROTECTED] Dept. of Biochemistry, University of Cambridge, 80 Tennis Court Road, Cambridge CB2 1GA, UK. FAX (01223)766002 -- http://mail.python.org/mailman/listinfo/python-list
Re: Rich Comparisons Gotcha
Steven DAprano wrote: On Mon, 08 Dec 2008 14:24:59 +, Rasmus Fogh wrote: snip What might be a sensible behaviour (unlike your proposed wrapper) Sorry 1) I was rude, 2) I thanked TJR for your wrapper class proposal in a later mail. It is yours. What do you dislike about my wrapper class? Perhaps it is fixable. I think it is a basic requirement for functioning lists that you get alist = [1,x] x in alist True alist.remove(x) alist [1] # unless of course x == 1, in which case the list is [x]. Your wrapper would not provide this behaviour. It is necessary to do if x is y: return True be it in the eq() function, or in the list implementation. Note that this is the current python behaviour for nan in lists, whatever the mathematics say. would be the following: def eq(x, y): if x is y: return True I've already mentioned NaNs. Sentinel values also sometimes need to compare not equal with themselves. Forcing them to compare equal will cause breakage. The list.__contains__ method already checks 'x is y' before it checks 'x == y'. I'd say that a list where my example above does not work is broken already, but of course I do not want to break further code. Could you give an example of this use of sentinel values? else: try: return (x == y) except Exception: return False Why False? Why not True? If an error occurs inside __eq__, how do you know that the correct result was False? class Broken(object): def __eq__(self, other): return Treu # oops, raises NameError In managing collections the purpose of eq would be to divide objects into a small set that are all equal to each other, and a larger set that are all unequal to all members of the first set. That requires default to False. If you default to True then eq(aNumpyArray, x) would return True for all x. If an error occurs inside __eq__ it could be 1) because __eq__ is badly written, or 2) because the type of y was not considered by the implementers of x or is in some deep way incompatible with x. 1) I cannot help, and for 2) I am simply saying that value semantics require an __eq__ that returns a truth value. In the absence of that I want identity semantics. Rasmus -- http://mail.python.org/mailman/listinfo/python-list
Re: Rich Comparisons Gotcha
On Dec 8, 2:24 pm, Rasmus Fogh [EMAIL PROTECTED] wrote: So, I would much prefer a language change. I am not competent to even propose one properly, but I'll try. I don't see any technical problems in what you propose: as far as I can see it's entirely feasible. However: should. On the minus side there would be the difference between '__equal__' and '__eq__' to confuse people. I think this is exactly what makes the idea a non-starter. There are already enough questions on the lists about when to use 'is' and when to use '==', without adding an 'equals' function into the mix. It would add significant extra complexity to the core language, for questionable (IMO) gain. There are certainly other languages for which this distinction would make sense; I just don't think it's appropriate for Python, with its emphasis on practicality and and simplicity. Mark On the plus side the behaviour of objects inside collections would now be explicitly defined, and __eq__ and __equal__ would be so similar that most people could ignore the distinction. Some examples: # NaN: # For floats, __equal__ would be the same as __eq__. For NaN this gives x = float('NaN') y = float('NaN') x == x False equal(x,x) True equal(x,y) False # It may be problematical mathematically, but computationally it makes # perfect sense that looking in a given storage location will give you the # same value every time, even if the actual value happens to be undefined. # The behaviour is simple to describe, and indeed NaN does behave this way # in collections at the moment. All we are doing is documenting it clearly. # numpy Numpy would have no __equal__ function, so we would have pure identity semantics - 'equals(x,y)' would be the same as 'x is y' # ordinary numbers. Any Python object with value semantics would need an __equal__ function with the correct behaviour. Mark Dickinson pointed out the thread Comparing float and decimal, which shows that comparisons between float and decimal numbers do not currently satisfy 3). It would not be attractive to have __equal__ and __eq__ behave differently for ordinary numbers, so if the relevant __eq__ can not be fixed that is a problem for my proposal. At this point I shall try to retire gracefully. Regrettably I am not competent to discuss if this can be done, how it can be done, and how much work is required. Rasmus --- Dr. Rasmus H. Fogh Email: [EMAIL PROTECTED] Dept. of Biochemistry, University of Cambridge, 80 Tennis Court Road, Cambridge CB2 1GA, UK. FAX (01223)766002 -- http://mail.python.org/mailman/listinfo/python-list
Re: Rich Comparisons Gotcha
Mark Dickinson wrote: On Dec 8, 2:24 pm, Rasmus Fogh [EMAIL PROTECTED] wrote: So, I would much prefer a language change. I am not competent to even propose one properly, but I'll try. I don't see any technical problems in what you propose: as far as I can see it's entirely feasible. However: should. On the minus side there would be the difference between '__equal__' and '__eq__' to confuse people. I think this is exactly what makes the idea a non-starter. There are already enough questions on the lists about when to use 'is' and when to use '==', without adding an 'equals' function into the mix. It would add significant extra complexity to the core language, for questionable (IMO) gain. So: It is perfectly acceptable behaviour to have __eq__ return a value that cannot be cast to a boolean, but it still does break the python list. The fixes proposed so far all get the thumbs down, for various good reasons. How about: - Define a new built-in Exception BoolNotDefinedError(ValueError) - Have list.__contains__ (etc.) use the following comparison internally: def newCollectionTest(x,y): if x is y: return True else: try: return bool(x == y) except BoolNotDefinedError: return False - Recommend that numpy.array.__nonzero__ and similar cases raise BoolNotDefinedError instead of ValueError Objects that choose to raise BoolNotDefinedError will now work in lists, with identity semantics. Objects that do not raise BoolNotDefinedError have no change in behaviour. Remains to be seen how hard it is to implement, and how much it slows down list.__contains__ Rasmus --- Dr. Rasmus H. Fogh Email: [EMAIL PROTECTED] Dept. of Biochemistry, University of Cambridge, 80 Tennis Court Road, Cambridge CB2 1GA, UK. FAX (01223)766002 -- http://mail.python.org/mailman/listinfo/python-list
Re: Rich Comparisons Gotcha
You grossly overvalue using the in operator on lists. It's far more common to use a dict or set for containment tests, due to O(1) performance rather than O(n). I doubt the numpy array supports hashing, so an error for misuse is all you should expect. In the rare case that you want to test for identity in a list, you can easily write your own function to do it upfront: def idcontains(seq, obj): for i in seq: if i is obj: return True return False -- http://mail.python.org/mailman/listinfo/python-list
Re: Rich Comparisons Gotcha
On Mon, 08 Dec 2008 14:24:59 -, Rasmus Fogh [EMAIL PROTECTED] wrote: On the minus side there would be the difference between '__equal__' and '__eq__' to confuse people. This is a very big minus. It would be far better to spell __equal__ in such a way as to make it clear why it wasn't the same as __eq__, otherwise you end up with the confusion that the Perl == and eq operators regularly cause. -- Rhodri James *-* Wildebeeste Herder to the Masses -- http://mail.python.org/mailman/listinfo/python-list
Re: Rich Comparisons Gotcha
Robert Kern wrote: James Stroud wrote: I think it skips straight to __eq__ if the element is not the first in the list. No, it doesn't skip straight to __eq__(). y is 1 returns False, so (y==1) is checked. When y is a numpy array, this returns an array of bools. list.__contains__() tries to convert this array to a bool and ndarray.__nonzero__() raises the exception. list.__contains__() checks is then __eq__() for each element before moving on to the next element. It does not try is for all elements, then try __eq__() for all elements. Ok. Thanks for the explanation. That no one acknowledges this makes me feel like a conspiracy is afoot. I don't know what you think I'm not acknowledging. Sorry. That was a failed attempt at humor. James -- http://mail.python.org/mailman/listinfo/python-list
Re: Rich Comparisons Gotcha
Rober Kern wrote: James Stroud wrote: Steven D'Aprano wrote: On Sun, 07 Dec 2008 13:57:54 -0800, James Stroud wrote: Rasmus Fogh wrote: ll1 = [y,1] y in ll1 True ll2 = [1,y] y in ll2 Traceback (most recent call last): File stdin, line 1, in module ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all() I think you could be safe calling this a bug with numpy. Only in the sense that there are special cases where the array elements are all true, or all false, and numpy *could* safely return a bool. But special cases are not special enough to break the rules. Better for the numpy caller to write this: a.all() # or any() instead of: try: bool(a) except ValueError: a.all() as they would need to do if numpy sometimes returned a bool and sometimes raised an exception. I'm missing how a.all() solves the problem Rasmus describes, namely that the order of a python *list* affects the results of containment tests by numpy.array. E.g. y in ll1 and y in ll2 evaluate to different results in his example. It still seems like a bug in numpy to me, even if too much other stuff is broken if you fix it (in which case it apparently becomes an issue). It's an issue, if anything, not a bug. There is no consistent implementation of bool(some_array) that works in all cases. numpy's predecessor Numeric used to implement this as returning True if at least one element was non-zero. This works well for bool(x!=y) (which is equivalent to (x!=y).any()) but does not work well for bool(x==y) (which should be (x==y).all()), but many people got confused and thought that bool(x==y) worked. When we made numpy, we decided to explicitly not allow bool(some_array) so that people will not write buggy code like this again. You are so right, Robert: The deficiency is in the feature of rich comparisons, not numpy's implementation of it. __eq__() is allowed to return non-booleans; however, there are some parts of Python's implementation like list.__contains__() that still expect the return value of __eq__() to be meaningfully cast to a boolean. One might argue if this is a deficiency in rich comparisons or a rather a bug in list, set and dict. Certainly numpy is following the rules. In fact numpy should be applauded for throwing an error rather than returning a misleading value. For my personal problem I could indeed wrap all objects in a wrapper with whatever 'correct' behaviour I want (thanks, TJR). It does seem a bit much, though, just to get code like this to work as intended: alist.append(x) print ('x is present: ', x in alist) So, I would much prefer a language change. I am not competent to even propose one properly, but I'll try. First, to clear the air: Rich comparisons, the ability to overload '==', and the constraints (or lack of them) on __eq__ must stay unchanged. There are reasons for their current behaviour - ieee754 is particularly convincing - and anyway they are not going to change. No point in trying. There remains the problem is that __eq__ is used inside python 'collections' (list, set, dict etc.), and that the kind of overloading used (quite legitimately) in numpy etc. breaks the collection behaviour. It seems that proper behaviour of the collections requires an equality test that satisfies: 1) x equal x 2) x equal y = y equal x 3) x equal y and y equal z = x equal z 4) (x equal y) is a boolean 5) (x equal y) is defined (and will not throw an error) for all x,y 6) x unequal y == not(x equal y) (by definition) Note to TJR: 5) does not mean that Python should magically shield me from errors. All I am asking is that programmers design their equal() function to avoid raising errors, and that errors raised from equal() clearly count as bugs. I cannot imagine getting the collections to work in a simple and intuitive manner without an equality test that satisfies 1)-6). Maybe somebody else can. Instead I would propose adding an __equal__ special method for the purpose. It looks like the current collections use the folowing, at least in part def oldCollectionTest(x,y): if x is y: return True else: return (x == y) I would propose adding a new __equal__ method that satisfies 2) - 6) above. We could then define def newCollectionTest(x,y): if x is y: # this takes care of satisfying 1) return True elif hasattr(x, '__equal__'): return x.__equal__(y) elif hasattr(y, '__equal__'): return y.__equal__(x) else: return False The implementations for list, set and dict would then behave according to newCollectionTest. We would also want an equal() built-in with the same behaviour. In plain words, the default behaviour would be identity semantics. Objects that wanted value semantics could implement an __equal__ function with the correct behaviour. Wherever possible __equal__ would be the same as __eq__. This function may deviate from 'proper' behaviour in some cases. All I claim
Re: Rich Comparisons Gotcha
On Dec 7, 4:20 pm, Steven D'Aprano [EMAIL PROTECTED] cybersource.com.au wrote: On Sun, 07 Dec 2008 15:32:53 -0600, Robert Kern wrote: Rasmus Fogh wrote: Current behaviour is both inconsistent and counterintuitive, as these examples show. x = float('NaN') x == x False Blame IEEE for that one. Rich comparisons have nothing to do with that one. There is nothing to blame them for. This is the correct behaviour. NaNs should *not* compare equal to themselves, that's mathematically incoherent. Mathematically, NaNs shouldn't be comparable at all. They should raise an exception when compared. In fact, they should raise an exception when *created*. But that's not what we want. What we want is a dummy value that silently plods through our calculations. For a dummy value it seems a lot more sense to pick an arbitrary yet consistent sort order (I suggest just above -Inf), rather than quietly screwing up the sort. Regarding the mythical IEEE 754, although it's extremely rare to find quotations, I have one on just this subject. And it does NOT say x == NaN gives false. It says it gives *unordered*. It is C and probably most other languages that turn that into false (as they want a dummy value, not an error.) http://groups.google.ca/group/sci.math.num-analysis/browse_thread/thread/ead0392e646b7cc0/a5bc354cd46f2c49?lnk=stq=why+does+NaN+not+equal+itself%3Frnum=3hl=enpli=1 -- http://mail.python.org/mailman/listinfo/python-list
Re: Rich Comparisons Gotcha
Robert Kern wrote: There is an explicit policy that __eq__() methods can return non-bools for various purposes. I consider that policy to a presence that can be removed. There is no check because that policy exists, not the other way around. OK, presence in manual versus presence in code. Anyways, this is really a semantic digression, and not particularly important. Peace? Yes -- http://mail.python.org/mailman/listinfo/python-list
Re: Rich Comparisons Gotcha
Rhamphoryncus wrote: On Dec 7, 4:20 pm, Steven D'Aprano [EMAIL PROTECTED] cybersource.com.au wrote: On Sun, 07 Dec 2008 15:32:53 -0600, Robert Kern wrote: Rasmus Fogh wrote: Current behaviour is both inconsistent and counterintuitive, as these examples show. x = float('NaN') x == x False Blame IEEE for that one. Rich comparisons have nothing to do with that one. There is nothing to blame them for. This is the correct behaviour. NaNs should *not* compare equal to themselves, that's mathematically incoherent. Mathematically, NaNs shouldn't be comparable at all. They should raise an exception when compared. In fact, they should raise an exception when *created*. But that's not what we want. What we want is a dummy value that silently plods through our calculations. For a dummy value it seems a lot more sense to pick an arbitrary yet consistent sort order (I suggest just above -Inf), rather than quietly screwing up the sort. Well, there are explicitly two kinds of NaNs: signalling NaNs and quiet NaNs, to accommodate both requirements. Additionally, there is significant flexibility in trapping the signals. Regarding the mythical IEEE 754, although it's extremely rare to find quotations, I have one on just this subject. And it does NOT say x == NaN gives false. It says it gives *unordered*. It is C and probably most other languages that turn that into false (as they want a dummy value, not an error.) http://groups.google.ca/group/sci.math.num-analysis/browse_thread/thread/ead0392e646b7cc0/a5bc354cd46f2c49?lnk=stq=why+does+NaN+not+equal+itself%3Frnum=3hl=enpli=1 Table 4 on page 9 of the standard is pretty clear on the subject. When the two operands are unordered, the operator == returns False. The standard defines how to do comparisons notionally; two operands can be greater than, less than, equal or unordered. It then goes on to map these notional concepts to programming language boolean predicates. -- Robert Kern I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth. -- Umberto Eco -- http://mail.python.org/mailman/listinfo/python-list
Re: Rich Comparisons Gotcha
On Dec 8, 11:54 am, Robert Kern [EMAIL PROTECTED] wrote: Rhamphoryncus wrote: On Dec 7, 4:20 pm, Steven D'Aprano [EMAIL PROTECTED] cybersource.com.au wrote: On Sun, 07 Dec 2008 15:32:53 -0600, Robert Kern wrote: Rasmus Fogh wrote: Current behaviour is both inconsistent and counterintuitive, as these examples show. x = float('NaN') x == x False Blame IEEE for that one. Rich comparisons have nothing to do with that one. There is nothing to blame them for. This is the correct behaviour. NaNs should *not* compare equal to themselves, that's mathematically incoherent. Mathematically, NaNs shouldn't be comparable at all. They should raise an exception when compared. In fact, they should raise an exception when *created*. But that's not what we want. What we want is a dummy value that silently plods through our calculations. For a dummy value it seems a lot more sense to pick an arbitrary yet consistent sort order (I suggest just above -Inf), rather than quietly screwing up the sort. Well, there are explicitly two kinds of NaNs: signalling NaNs and quiet NaNs, to accommodate both requirements. Additionally, there is significant flexibility in trapping the signals. Right, but most of that's lower level. By the time it reaches Python we only care about quiet NaNs. Regarding the mythical IEEE 754, although it's extremely rare to find quotations, I have one on just this subject. And it does NOT say x == NaN gives false. It says it gives *unordered*. It is C and probably most other languages that turn that into false (as they want a dummy value, not an error.) http://groups.google.ca/group/sci.math.num-analysis/browse_thread/thr... Table 4 on page 9 of the standard is pretty clear on the subject. When the two operands are unordered, the operator == returns False. The standard defines how to do comparisons notionally; two operands can be greater than, less than, equal or unordered. It then goes on to map these notional concepts to programming language boolean predicates. Ahh, interesting. Still though, does it give an explanation for such behaviour, or use cases? There must be some situation where blindly returning false is enough benefit to trump screwing up sorting. -- http://mail.python.org/mailman/listinfo/python-list
Re: Rich Comparisons Gotcha
Rhamphoryncus wrote: On Dec 8, 11:54 am, Robert Kern [EMAIL PROTECTED] wrote: Rhamphoryncus wrote: On Dec 7, 4:20 pm, Steven D'Aprano [EMAIL PROTECTED] cybersource.com.au wrote: On Sun, 07 Dec 2008 15:32:53 -0600, Robert Kern wrote: Rasmus Fogh wrote: Current behaviour is both inconsistent and counterintuitive, as these examples show. x = float('NaN') x == x False Blame IEEE for that one. Rich comparisons have nothing to do with that one. There is nothing to blame them for. This is the correct behaviour. NaNs should *not* compare equal to themselves, that's mathematically incoherent. Mathematically, NaNs shouldn't be comparable at all. They should raise an exception when compared. In fact, they should raise an exception when *created*. But that's not what we want. What we want is a dummy value that silently plods through our calculations. For a dummy value it seems a lot more sense to pick an arbitrary yet consistent sort order (I suggest just above -Inf), rather than quietly screwing up the sort. Well, there are explicitly two kinds of NaNs: signalling NaNs and quiet NaNs, to accommodate both requirements. Additionally, there is significant flexibility in trapping the signals. Right, but most of that's lower level. By the time it reaches Python we only care about quiet NaNs. No, signaling NaNs raise the exception that you are asking for. You're right that if you get a Python float object that is a NaN, it is probably going to be quiet, but signaling NaNs can affect Python in the way that you want. Regarding the mythical IEEE 754, although it's extremely rare to find quotations, I have one on just this subject. And it does NOT say x == NaN gives false. It says it gives *unordered*. It is C and probably most other languages that turn that into false (as they want a dummy value, not an error.) http://groups.google.ca/group/sci.math.num-analysis/browse_thread/thr... Table 4 on page 9 of the standard is pretty clear on the subject. When the two operands are unordered, the operator == returns False. The standard defines how to do comparisons notionally; two operands can be greater than, less than, equal or unordered. It then goes on to map these notional concepts to programming language boolean predicates. Ahh, interesting. Still though, does it give an explanation for such behaviour, or use cases? There must be some situation where blindly returning false is enough benefit to trump screwing up sorting. Well, the standard was written in the days of Fortran. You didn't really have generic sorting routines. You *could* implement whatever ordering you wanted because you *had* to implement the ordering yourself. You didn't have to use a limited boolean predicate. Basically, the boolean predicates have to return either True or False. Neither one is really satisfactory, but that's the constraint you're under. -- Robert Kern I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth. -- Umberto Eco -- http://mail.python.org/mailman/listinfo/python-list
Re: Rich Comparisons Gotcha
On Dec 8, 1:04 pm, Robert Kern [EMAIL PROTECTED] wrote: Rhamphoryncus wrote: On Dec 8, 11:54 am, Robert Kern [EMAIL PROTECTED] wrote: Rhamphoryncus wrote: On Dec 7, 4:20 pm, Steven D'Aprano [EMAIL PROTECTED] cybersource.com.au wrote: On Sun, 07 Dec 2008 15:32:53 -0600, Robert Kern wrote: Rasmus Fogh wrote: Current behaviour is both inconsistent and counterintuitive, as these examples show. x = float('NaN') x == x False Blame IEEE for that one. Rich comparisons have nothing to do with that one. There is nothing to blame them for. This is the correct behaviour. NaNs should *not* compare equal to themselves, that's mathematically incoherent. Mathematically, NaNs shouldn't be comparable at all. They should raise an exception when compared. In fact, they should raise an exception when *created*. But that's not what we want. What we want is a dummy value that silently plods through our calculations. For a dummy value it seems a lot more sense to pick an arbitrary yet consistent sort order (I suggest just above -Inf), rather than quietly screwing up the sort. Well, there are explicitly two kinds of NaNs: signalling NaNs and quiet NaNs, to accommodate both requirements. Additionally, there is significant flexibility in trapping the signals. Right, but most of that's lower level. By the time it reaches Python we only care about quiet NaNs. No, signaling NaNs raise the exception that you are asking for. You're right that if you get a Python float object that is a NaN, it is probably going to be quiet, but signaling NaNs can affect Python in the way that you want. Regarding the mythical IEEE 754, although it's extremely rare to find quotations, I have one on just this subject. And it does NOT say x == NaN gives false. It says it gives *unordered*. It is C and probably most other languages that turn that into false (as they want a dummy value, not an error.) http://groups.google.ca/group/sci.math.num-analysis/browse_thread/thr... Table 4 on page 9 of the standard is pretty clear on the subject. When the two operands are unordered, the operator == returns False. The standard defines how to do comparisons notionally; two operands can be greater than, less than, equal or unordered. It then goes on to map these notional concepts to programming language boolean predicates. Ahh, interesting. Still though, does it give an explanation for such behaviour, or use cases? There must be some situation where blindly returning false is enough benefit to trump screwing up sorting. Well, the standard was written in the days of Fortran. You didn't really have generic sorting routines. You *could* implement whatever ordering you wanted because you *had* to implement the ordering yourself. You didn't have to use a limited boolean predicate. Basically, the boolean predicates have to return either True or False. Neither one is really satisfactory, but that's the constraint you're under. We've always done it that way is NOT a use case! Certainly, it's a factor, but it seems quite weak compared to the sort use case. I suppose what I'm hoping for is an small example program (one or a few functions) that needs the always false behaviour of NaN. -- http://mail.python.org/mailman/listinfo/python-list
Re: Rich Comparisons Gotcha
Rhamphoryncus wrote: On Dec 8, 1:04 pm, Robert Kern [EMAIL PROTECTED] wrote: Rhamphoryncus wrote: On Dec 8, 11:54 am, Robert Kern [EMAIL PROTECTED] wrote: Rhamphoryncus wrote: On Dec 7, 4:20 pm, Steven D'Aprano [EMAIL PROTECTED] cybersource.com.au wrote: On Sun, 07 Dec 2008 15:32:53 -0600, Robert Kern wrote: Rasmus Fogh wrote: Current behaviour is both inconsistent and counterintuitive, as these examples show. x = float('NaN') x == x False Blame IEEE for that one. Rich comparisons have nothing to do with that one. There is nothing to blame them for. This is the correct behaviour. NaNs should *not* compare equal to themselves, that's mathematically incoherent. Mathematically, NaNs shouldn't be comparable at all. They should raise an exception when compared. In fact, they should raise an exception when *created*. But that's not what we want. What we want is a dummy value that silently plods through our calculations. For a dummy value it seems a lot more sense to pick an arbitrary yet consistent sort order (I suggest just above -Inf), rather than quietly screwing up the sort. Well, there are explicitly two kinds of NaNs: signalling NaNs and quiet NaNs, to accommodate both requirements. Additionally, there is significant flexibility in trapping the signals. Right, but most of that's lower level. By the time it reaches Python we only care about quiet NaNs. No, signaling NaNs raise the exception that you are asking for. You're right that if you get a Python float object that is a NaN, it is probably going to be quiet, but signaling NaNs can affect Python in the way that you want. Regarding the mythical IEEE 754, although it's extremely rare to find quotations, I have one on just this subject. And it does NOT say x == NaN gives false. It says it gives *unordered*. It is C and probably most other languages that turn that into false (as they want a dummy value, not an error.) http://groups.google.ca/group/sci.math.num-analysis/browse_thread/thr... Table 4 on page 9 of the standard is pretty clear on the subject. When the two operands are unordered, the operator == returns False. The standard defines how to do comparisons notionally; two operands can be greater than, less than, equal or unordered. It then goes on to map these notional concepts to programming language boolean predicates. Ahh, interesting. Still though, does it give an explanation for such behaviour, or use cases? There must be some situation where blindly returning false is enough benefit to trump screwing up sorting. Well, the standard was written in the days of Fortran. You didn't really have generic sorting routines. You *could* implement whatever ordering you wanted because you *had* to implement the ordering yourself. You didn't have to use a limited boolean predicate. Basically, the boolean predicates have to return either True or False. Neither one is really satisfactory, but that's the constraint you're under. We've always done it that way is NOT a use case! Certainly, it's a factor, but it seems quite weak compared to the sort use case. I didn't say it was. I was explaining that sorting was probably *not* a use case for the boolean predicates at the time of writing of the standard. In fact, it suggests implementing a Compare() function that returns greater than, less than, equal or unordered in addition to the boolean predicates. That Python eventually chose to use a generic boolean predicate as the basis of its sorting routine many years after the IEEE-754 standard is another matter entirely. In any case, the standard itself is quite short, and does not spend much time justifying itself in any detail. I suppose what I'm hoping for is an small example program (one or a few functions) that needs the always false behaviour of NaN. Steven D'Aprano gave one earlier in the thread. Additionally, (x!=x) is a simple test for NaNs if an IsNaN(x) function is not available. Really, though, the result falls out from the way that IEEE-754 constructed the logic of the system. It is not defined that (NaN==NaN) should return False, per se. Rather, all of the boolean predicates are defined in terms of that Compare(x,y) function. If that function returns unordered, then (x==y) is False. It doesn't matter if one or both are NaNs; in either case, the result is unordered. -- Robert Kern I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth. -- Umberto Eco -- http://mail.python.org/mailman/listinfo/python-list
Re: Rich Comparisons Gotcha
Rasmus Fogh wrote: For my personal problem I could indeed wrap all objects in a wrapper with whatever 'correct' behaviour I want (thanks, TJR). It does seem a bit I was not suggesting that you wrap *everything*, merely an adaptor for numpy arrays in whatever subclass and source it is that feeds them to your code. It is fairly unusual, I think, to find numpy arrays 'in the wild', outside the constrained context of numerical code where the programmer uses them intentionally and hopefully understands their peculiarities. much, though, just to get code like this to work as intended: alist.append(x) print ('x is present: ', x in alist) Even if rich comparisons as you propose, the above would *still* not necessarily work. Collection classes can define a __contains__ that overrides the default and that can do anything, though True/False is recommended. As best I can think of at the moment, the only things you can absolutely depend on is that builtin id(ob) will return an int, that 'ob1 is ob2' (based in id()) will be True or False, and that builtin type(ob) will be a class (at least in 3.0, not sure of 2.x). The names can be rebound but you can control that within the module you write. This is what I meant when I said that 'generic' nearly always needs to be qualified to something like 'generic for objects that meet the interface requirements'. Every function has that precondition as part of its implied contract. Your code has an interface requirement that 'x in y' not raise an exception. An x,y pair that does it outside its contract. Terry Jan Reedy -- http://mail.python.org/mailman/listinfo/python-list
Re: Rich Comparisons Gotcha
Terry Reedy wrote: Rasmus Fogh wrote: much, though, just to get code like this to work as intended: alist.append(x) print ('x is present: ', x in alist) Even if rich comparisons as you propose, the above would *still* not necessarily work. Collection classes can define a __contains__ that overrides the default and that can do anything, though True/False is recommended. No, it's actually required. In [4]: class A(object): def __contains__(self, other): return 'foo' ...: ...: In [7]: a = A() In [8]: 1 in a Out[8]: True Okay, so it will coerce to True/False for you, but unlike rich comparisons, the return value must be interpretable as a boolean. -- Robert Kern I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth. -- Umberto Eco -- http://mail.python.org/mailman/listinfo/python-list
Re: Rich Comparisons Gotcha
Terry Reedy wrote: Rasmus Fogh wrote: For my personal problem I could indeed wrap all objects in a wrapper with whatever 'correct' behaviour I want (thanks, TJR). It does seem a bit I was not suggesting that you wrap *everything*, merely an adaptor for numpy arrays in whatever subclass and source it is that feeds them to your code. It is fairly unusual, I think, to find numpy arrays 'in the wild', outside the constrained context of numerical code where the programmer uses them intentionally and hopefully understands their peculiarities. much, though, just to get code like this to work as intended: alist.append(x) print ('x is present: ', x in alist) Even if rich comparisons as you propose, the above would *still* not necessarily work. Collection classes can define a __contains__ that overrides the default and that can do anything, though True/False is recommended. If you have a list of results and you want to see whether one of them is Nan then the obvious way is Nan in results, but __contains__ uses __eq__ and Nan == Nan returns False, so Nan in results returns False. Hmm... Nan is Nan returns True, so if there was a version of __contains__ which used is then Nan in results would return True. Perhaps Nan is in results? Or would that be too confusing, ie in vs is in? As best I can think of at the moment, the only things you can absolutely depend on is that builtin id(ob) will return an int, that 'ob1 is ob2' (based in id()) will be True or False, and that builtin type(ob) will be a class (at least in 3.0, not sure of 2.x). The names can be rebound but you can control that within the module you write. I wonder whether there could be some syntactic sugar which would wrap try...except... around an expression, eg except(foo(), False), which would return False if foo() raised an exception, otherwise return the result of foo(). This is what I meant when I said that 'generic' nearly always needs to be qualified to something like 'generic for objects that meet the interface requirements'. Every function has that precondition as part of its implied contract. Your code has an interface requirement that 'x in y' not raise an exception. An x,y pair that does it outside its contract. -- http://mail.python.org/mailman/listinfo/python-list
Re: Rich Comparisons Gotcha
MRAB wrote: Terry Reedy wrote: Rasmus Fogh wrote: For my personal problem I could indeed wrap all objects in a wrapper with whatever 'correct' behaviour I want (thanks, TJR). It does seem a bit I was not suggesting that you wrap *everything*, merely an adaptor for numpy arrays in whatever subclass and source it is that feeds them to your code. It is fairly unusual, I think, to find numpy arrays 'in the wild', outside the constrained context of numerical code where the programmer uses them intentionally and hopefully understands their peculiarities. much, though, just to get code like this to work as intended: alist.append(x) print ('x is present: ', x in alist) Even if rich comparisons as you propose, the above would *still* not necessarily work. Collection classes can define a __contains__ that overrides the default and that can do anything, though True/False is recommended. If you have a list of results and you want to see whether one of them is Nan then the obvious way is Nan in results, but __contains__ uses __eq__ and Nan == Nan returns False, so Nan in results returns False. Hmm... Nan is Nan returns True, However, Nan is SomeOtherNan does not return True. so if there was a version of __contains__ which used is then Nan in results would return True. Perhaps Nan is in results? Or would that be too confusing, ie in vs is in? list.__contains__() already checks with is before it tries ==. In [65]: from numpy import nan, inf In [66]: other_nan = inf/inf In [67]: nan in [nan] Out[67]: True In [68]: nan in [other_nan] Out[68]: False -- Robert Kern I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth. -- Umberto Eco -- http://mail.python.org/mailman/listinfo/python-list
Re: Rich Comparisons Gotcha
On Dec 8, 2:51 pm, Robert Kern [EMAIL PROTECTED] wrote: Rhamphoryncus wrote: We've always done it that way is NOT a use case! Certainly, it's a factor, but it seems quite weak compared to the sort use case. I didn't say it was. I was explaining that sorting was probably *not* a use case for the boolean predicates at the time of writing of the standard. In fact, it suggests implementing a Compare() function that returns greater than, less than, equal or unordered in addition to the boolean predicates. That Python eventually chose to use a generic boolean predicate as the basis of its sorting routine many years after the IEEE-754 standard is another matter entirely. I interpret that to mean IEEE 754's semantics are for different circumstances and are inapplicable to Python. In any case, the standard itself is quite short, and does not spend much time justifying itself in any detail. A pity, as it is often invoked to explain language design. I suppose what I'm hoping for is an small example program (one or a few functions) that needs the always false behaviour of NaN. Steven D'Aprano gave one earlier in the thread. I see examples of behaviour, but no use cases. Additionally, (x!=x) is a simple test for NaNs if an IsNaN(x) function is not available. That's a trick to work around the lack of IsNaN(x). Again, not a use case. Really, though, the result falls out from the way that IEEE-754 constructed the logic of the system. It is not defined that (NaN==NaN) should return False, per se. Rather, all of the boolean predicates are defined in terms of that Compare(x,y) function. If that function returns unordered, then (x==y) is False. It doesn't matter if one or both are NaNs; in either case, the result is unordered. And if I arbitrarily dictate that NaN is a single value which is orderable, sorting just above -Infinity, then all the behaviour makes a lot more sense AND I fix sort. So you see the predicament I'm in. On the one hand we have a problem and an obvious solution. On the other hand we've got historical behaviour which everybody insists *must* remain, reasons unknown. It reeks of the Parable of the Monkeys. I think I should head over to one of the math groups and see if they can find a reason for it. -- http://mail.python.org/mailman/listinfo/python-list
Re: Rich Comparisons Gotcha
Robert Kern wrote: Terry Reedy wrote: Rasmus Fogh wrote: much, though, just to get code like this to work as intended: alist.append(x) print ('x is present: ', x in alist) Even if rich comparisons as you propose, the above would *still* not necessarily work. Collection classes can define a __contains__ that overrides the default and that can do anything, though True/False is recommended. No, it's actually required. In [4]: class A(object): def __contains__(self, other): return 'foo' ...: ...: In [7]: a = A() In [8]: 1 in a Out[8]: True Okay, so it will coerce to True/False for you, but unlike rich comparisons, the return value must be interpretable as a boolean. Interesting. I did not expect that from Should return true if item is in self, false otherwise., but maybe the lowercase true/false is an (undocumented?) abbreviation for 'object with Boolean value True/False'. Of course, if the return value is not so interpretable, or if __contains__ raises an exception, there is no coercion and the OP's code will not work. A different summary of my main point in this thread: Dynamic binding and special method hooks make somewhat generic code possible, but the same special method hooks make absolutely generic code nearly impossible. tjr -- http://mail.python.org/mailman/listinfo/python-list
Re: Rich Comparisons Gotcha
On Mon, 08 Dec 2008 14:24:59 +, Rasmus Fogh wrote: For my personal problem I could indeed wrap all objects in a wrapper with whatever 'correct' behaviour I want (thanks, TJR). It does seem a bit much, though, just to get code like this to work as intended: alist.append(x) print ('x is present: ', x in alist) So, I would much prefer a language change. I am not competent to even propose one properly, but I'll try. You think changing the language is easier than applying a wrapper to your own data??? Oh my, that's too funny for words. -- Steven -- http://mail.python.org/mailman/listinfo/python-list
Re: Rich Comparisons Gotcha
On Mon, 08 Dec 2008 10:20:56 -0800, Rhamphoryncus wrote: On Dec 7, 4:20 pm, Steven D'Aprano [EMAIL PROTECTED] cybersource.com.au wrote: On Sun, 07 Dec 2008 15:32:53 -0600, Robert Kern wrote: Rasmus Fogh wrote: Current behaviour is both inconsistent and counterintuitive, as these examples show. x = float('NaN') x == x False Blame IEEE for that one. Rich comparisons have nothing to do with that one. There is nothing to blame them for. This is the correct behaviour. NaNs should *not* compare equal to themselves, that's mathematically incoherent. Mathematically, NaNs shouldn't be comparable at all. They should raise an exception when compared. In fact, they should raise an exception when *created*. But that's not what we want. What we want is a dummy value that silently plods through our calculations. For a dummy value it seems a lot more sense to pick an arbitrary yet consistent sort order (I suggest just above -Inf), rather than quietly screwing up the sort. Regarding the mythical IEEE 754, It's hardly mythical. http://ieeexplore.ieee.org/ISOL/standardstoc.jsp?punumber=4610933 although it's extremely rare to find quotations, I have one on just this subject. And it does NOT say x == NaN gives false. It says it gives *unordered*. Unordered means that none of the following is true: x NaN x NaN x == NaN It doesn't mean that comparing a NaN with something else is an error. -- Steven -- http://mail.python.org/mailman/listinfo/python-list
Re: Rich Comparisons Gotcha
On Sun, 07 Dec 2008 16:24:58 -0800, George Sakkis wrote: On Dec 7, 6:37 pm, Steven D'Aprano [EMAIL PROTECTED] cybersource.com.au wrote: ... Given: x = log(-5) # a NaN y = log(-2) # the same NaN x == y # Some people want this to be true for NaNs. Then: # Compare x and y directly. log(-5) == log(-2) # If x == y then exp(x) == exp(y) for all x, y. exp(log(-5)) == exp(log(-2)) -5 == -2 and now the entire foundations of mathematics collapses into a steaming pile of rubble. And why doesn't this happen with the current behavior if x = y = log (-5) ? According to the same proof, -5 != -5. You're right, I was a little sloppy in my proof. There are additional subtleties going on. -- Steven -- http://mail.python.org/mailman/listinfo/python-list
Re: Rich Comparisons Gotcha
On Dec 8, 7:44 pm, Steven D'Aprano [EMAIL PROTECTED] wrote: On Mon, 08 Dec 2008 10:20:56 -0800, Rhamphoryncus wrote: On Dec 7, 4:20 pm, Steven D'Aprano [EMAIL PROTECTED] cybersource.com.au wrote: On Sun, 07 Dec 2008 15:32:53 -0600, Robert Kern wrote: Rasmus Fogh wrote: Current behaviour is both inconsistent and counterintuitive, as these examples show. x = float('NaN') x == x False Blame IEEE for that one. Rich comparisons have nothing to do with that one. There is nothing to blame them for. This is the correct behaviour. NaNs should *not* compare equal to themselves, that's mathematically incoherent. Mathematically, NaNs shouldn't be comparable at all. They should raise an exception when compared. In fact, they should raise an exception when *created*. But that's not what we want. What we want is a dummy value that silently plods through our calculations. For a dummy value it seems a lot more sense to pick an arbitrary yet consistent sort order (I suggest just above -Inf), rather than quietly screwing up the sort. Regarding the mythical IEEE 754, It's hardly mythical. http://ieeexplore.ieee.org/ISOL/standardstoc.jsp?punumber=4610933 I consider it to be mythical because most knowledge of it is indirect. Few who use floating point have the documents available to them. Requiring purchase/membership is the cause of this. although it's extremely rare to find quotations, I have one on just this subject. And it does NOT say x == NaN gives false. It says it gives *unordered*. Unordered means that none of the following is true: x NaN x NaN x == NaN It doesn't mean that comparing a NaN with something else is an error. Robert Kern already clarified that. My confusion was due to relying on second-hand knowledge. -- http://mail.python.org/mailman/listinfo/python-list
Re: Rich Comparisons Gotcha
Rasmus Fogh wrote: Dear All, For the first time I have come across a Python feature that seems completely wrong. After the introduction of rich comparisons, equality comparison does not have to return a truth value, and may indeed return nothing at all and throw an error instead. As a result, code like if foo == bar: or foo in alist cannot be relied on to work. This is clearly no accident. According to the documentation all comparison operators are allowed to return non-booleans, or to throw errors. There is explicitly no guarantee that x == x is True. I'm not a computer scientist, so my language and perspective on the topic may be a bit naive, but I'll try to demonstrate my caveman understanding example. First, here is why the ability to throw an error is a feature: class Apple(object): def __init__(self, appleness): self.appleness = appleness def __cmp__(self, other): assert isinstance(other, Apple), 'must compare apples to apples' return cmp(self.appleness, other.appleness) class Orange(object): pass Apple(42) == Orange() Second, consider that any value in python also evaluates to a truth value in boolean context. Third, every function returns something. A function's returning nothing is not a possibility in the python language. None is something but evaluates to False in boolean context. But surely you can define an equal/unequal classification for all types of object, if you want to? This reminds me of complex numbers: would 4 + 4i be equal to sqrt(32)? Even in the realm of pure mathematics, the generality of objects (i.e. numbers) can not be assumed. James -- James Stroud UCLA-DOE Institute for Genomics and Proteomics Box 951570 Los Angeles, CA 90095 http://www.jamesstroud.com -- http://mail.python.org/mailman/listinfo/python-list
Re: Rich Comparisons Gotcha
Robert Kern Wrote: Terry Reedy wrote: Rasmus Fogh wrote: Personally I would like to get these [EMAIL PROTECTED]* misfeatures removed, What you are calling a misfeature is an absence, not a presence that can be removed. That's not quite true. Rich comparisons explicitly allow non-boolean return values. Breaking up __cmp__ into multiple __special__ methods was not the sole purpose of rich comparisons. One of the prime examples at the time was numpy (well, Numeric at the time). We wanted to use == to be able to return an array with boolean values where the two operand arrays were equal. E.g. In [1]: from numpy import * In [2]: array([1, 2, 3]) == array([4, 2, 3]) Out[2]: array([False, True, True], dtype=bool) SQLAlchemy uses these operators to build up objects that will be turned into SQL expressions. print users.c.id==addresses.c.user_id users.id = addresses.user_id Basically, the idea was to turn these operators into full-fledged operators like +-/*. Returning a non-boolean violates neither the letter, nor the spirit of the feature. Unfortunately, if you do overload __eq__ to build up expressions or whatnot, the other places where users of __eq__ are implicitly expecting a boolean break. While I was (and am) a supporter of rich comparisons, I feel Rasmus's pain from time to time. It would be nice to have an alternate method to express the boolean yes, this thing is equal in value to that other thing. Unfortunately, I haven't figured out a good way to fit it in now without sacrificing rich comparisons entirely. The best way, IMHO, would have been to use an alternative notation in numpy and SQLalchemy, and have '==' always return only a truth value - it could be a non-boolean as long as the bool() function gave the correct result. Surely the extra convenience of overloading '==' in special cases was not worth breaking such basic operations as 'bool(x == y)' or 'x in alist'. Again, the problem is only with '==', not with '', '=' etc. Of course it is done now, and unlikely to be reversed. and constrain the __eq__ function to always return a truth value. It is impossible to do that with certainty by any mechanical creation-time checking. So the implementation of operator.eq would have to check the return value of the ob.__eq__ function it calls *every time*. That would slow down the speed of the 99.xx% of cases where the check is not needed and would still not prevent exceptions. And if the return value was bad, all operator.eq could do is raise and exception anyway. Sure, but then it would be a bug to return a non-boolean from __eq__ and friends. It is not a bug today. I think that's what Rasmus is proposing. Yes, that is the point. If __eq__ functions are *supposed* to return booleans I can write generic code that will work for well-behaved objects, and any errors will be somebody elses fault. If __eq__ is free to return anything, or throw an error, it becomes my responsibility to write generic code that will work anyway, including with floating point numbers, numpy, or SQLalchemy. And I cannot see any way to do that (suggestions welcome). If purportedly general code does not work with numpy, your average numpy user will not be receptive to the idea that it is all numpys fault. Current behaviour is both inconsistent and counterintuitive, as these examples show. x = float('NaN') x == x False ll = [x] x in ll True x == ll[0] False import numpy y = numpy.zeros((3,)) y array([ 0., 0., 0.]) bool(y==y) Traceback (most recent call last): File stdin, line 1, in module ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all() ll1 = [y,1] y in ll1 True ll2 = [1,y] y in ll2 Traceback (most recent call last): File stdin, line 1, in module ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all() Can anybody see a way this could be fixed (please)? I may well have to live with it, but I would really prefer not to. --- Dr. Rasmus H. Fogh Email: [EMAIL PROTECTED] Dept. of Biochemistry, University of Cambridge, 80 Tennis Court Road, Cambridge CB2 1GA, UK. FAX (01223)766002 -- http://mail.python.org/mailman/listinfo/python-list
Re: Rich Comparisons Gotcha
Jamed Stroud Wrote: Rasmus Fogh wrote: Dear All, For the first time I have come across a Python feature that seems completely wrong. After the introduction of rich comparisons, equality comparison does not have to return a truth value, and may indeed return nothing at all and throw an error instead. As a result, code like if foo == bar: or foo in alist cannot be relied on to work. This is clearly no accident. According to the documentation all comparison operators are allowed to return non-booleans, or to throw errors. There is explicitly no guarantee that x == x is True. I'm not a computer scientist, so my language and perspective on the topic may be a bit naive, but I'll try to demonstrate my caveman understanding example. First, here is why the ability to throw an error is a feature: class Apple(object): def __init__(self, appleness): self.appleness = appleness def __cmp__(self, other): assert isinstance(other, Apple), 'must compare apples to apples' return cmp(self.appleness, other.appleness) class Orange(object): pass Apple(42) == Orange() True, but that does not hold for __eq__, only for __cmp__, and for__gt__, __le__, etc. Consider: Class Apple(object): def __init__(self, appleness): self.appleness = appleness def __gt__(self, other): assert isinstance(other, Apple), 'must compare apples to apples' return (self.appleness other.appleness) def __eq__(self, other): if isinstance(other, Apple): return (self.appleness == other.appleness) else: return False Second, consider that any value in python also evaluates to a truth value in boolean context. Third, every function returns something. A function's returning nothing is not a possibility in the python language. None is something but evaluates to False in boolean context. Indeed. The requirement would be not that return_value was a boolean, but that bool(return_value) was defined and gave the correct result. I understand that in some old Numeric/numpy version the numpy array __eq__ function returned a non-empty array, so that bool(numarray1 == numarray2) was true for any pair of arguments, which is one way of breaking '=='. In current numpy, even bool(numarray1 == 1) throws an error, which is another way of breaking '=='. But surely you can define an equal/unequal classification for all types of object, if you want to? This reminds me of complex numbers: would 4 + 4i be equal to sqrt(32)? Even in the realm of pure mathematics, the generality of objects (i.e. numbers) can not be assumed. It sounds like that problem is simpler in computing. sqrt(32) evaluates to 5.6568542494923806 on my computer. A complex number c with non-zero imaginary part would be unequal to sqrt(32) even if it so happened that c*c==32. Yours, Rasmus --- Dr. Rasmus H. Fogh Email: [EMAIL PROTECTED] Dept. of Biochemistry, University of Cambridge, 80 Tennis Court Road, Cambridge CB2 1GA, UK. FAX (01223)766002 -- http://mail.python.org/mailman/listinfo/python-list
Re: Rich Comparisons Gotcha
Quoting James Stroud [EMAIL PROTECTED]: First, here is why the ability to throw an error is a feature: class Apple(object): def __init__(self, appleness): self.appleness = appleness def __cmp__(self, other): assert isinstance(other, Apple), 'must compare apples to apples' return cmp(self.appleness, other.appleness) class Orange(object): pass Apple(42) == Orange() I beg to disagree. The right answer for the question Am I equal to this chair right here? is not I don't know, nor I can't compare. The answer is No, I'm not a chair, thus I'm not equal to this chair right here. If someone comes to my house, looking for me, he will not run away because he sees a chair before he sees me. Your assert doesn't belong inside the methot, it should be up to the caller to decide if the human-chair comparisons make sense or not. I certainly don't want to be type-checking when looking for an object within a mixed-type collection. This reminds me of complex numbers: would 4 + 4i be equal to sqrt(32)? I assume you meant sqrt(32i). Well, sqrt is a function, and if its result value is defined as 4+4i, then the answer is 'yes', otherwise, the answer should be no. sqrt(4) is *not* -2, and should not be equal to -2. The standard definition of the square root _function_ for real numbers is to take the non-negative real root. I haven't heard of a standard square root _function_ for complex numbers (there is of course, a definition of square root, but it is not a function). So, if by your definition of sqrt, sqrt(32i) returns a number, there is no ambiguity. -2 is not sqrt(4). If you need the answer to be 'True', you may be asking the wrong question. -- http://mail.python.org/mailman/listinfo/python-list
Re: Rich Comparisons Gotcha
On Sun, 07 Dec 2008 13:03:43 +, Rasmus Fogh wrote: Jamed Stroud Wrote: ... Second, consider that any value in python also evaluates to a truth value in boolean context. But bool(x) can fail too. So not every object in Python can be interpreted as a truth value. Third, every function returns something. Unless it doesn't return at all. A function's returning nothing is not a possibility in the python language. None is something but evaluates to False in boolean context. Indeed. The requirement would be not that return_value was a boolean, but that bool(return_value) was defined and gave the correct result. If __bool__ or __nonzero__ raises an exception, you would like Python to ignore the exception and return True or False. Which should it be? How do you know what the correct result should be? From the Zen of Python: In the face of ambiguity, refuse the temptation to guess. All binary operators are ambiguous when dealing with vector or array operands. Should the operator operate on the array as a whole, or on each element? The numpy people have decided that element-wise equality testing is more useful for them, and this is their prerogative to do so. In fact, the move to rich comparisons was driven by the needs of numpy. http://www.python.org/dev/peps/pep-0207/ It is a *VERY* important third-party library, and this was not the first and probably won't be the last time that their needs will move into Python the language. Python encourages such domain-specific behaviour. In fact, that's what operator-overloading is all about: classes can define what any operator means for *them*. There's no requirement that the infinity of potential classes must all define operators in a mutually compatible fashion, not even for comparison operators. For example, consider a class implementing one particular version of three-value logic. It isn't enough for == to only return True or False, because you also need Maybe: True == False = returns False True == True = returns True True == Maybe = returns Maybe etc. Or consider fuzzy logic, where instead of two truth values, you have a continuum of truth values between 0.0 and 1.0. What should comparing two such fuzzy values for equality return? A boolean True/False? Another fuzzy value? Another one from the Zen: Special cases aren't special enough to break the rules. The rules are that classes can customize their behaviour, that methods can fail, and that Python should not try to guess what the correct value should have been in the event of such a failure. Equality is a special case, but it isn't so special that it needs to be an exception from those rules. If you really need a guaranteed-can't-fail[1] equality test, try something like this untested wrapper class: class EqualityWrapper(object): def __init__(self, obj): self.wrapped = obj def __eq__(self, other): try: return bool(self.wrapped == other) except Exception: return False # or maybe True? Now wrap all your data: data = [a list of arbitrary objects] data = map(EqualityWrapper, data) process(data) [1] Not a guarantee. -- Steven -- http://mail.python.org/mailman/listinfo/python-list
Re: Rich Comparisons Gotcha
On Sun, 07 Dec 2008 13:03:43 +, Rasmus Fogh wrote: Jamed Stroud Wrote: ... Second, consider that any value in python also evaluates to a truth value in boolean context. But bool(x) can fail too. So not every object in Python can be interpreted as a truth value. Third, every function returns something. Unless it doesn't return at all. A function's returning nothing is not a possibility in the python language. None is something but evaluates to False in boolean context. Indeed. The requirement would be not that return_value was a boolean, but that bool(return_value) was defined and gave the correct result. If __bool__ or __nonzero__ raises an exception, you would like Python to ignore the exception and return True or False. Which should it be? How do you know what the correct result should be? From the Zen of Python: In the face of ambiguity, refuse the temptation to guess. All binary operators are ambiguous when dealing with vector or array operands. Should the operator operate on the array as a whole, or on each element? The numpy people have decided that element-wise equality testing is more useful for them, and this is their prerogative to do so. In fact, the move to rich comparisons was driven by the needs of numpy. http://www.python.org/dev/peps/pep-0207/ It is a *VERY* important third-party library, and this was not the first and probably won't be the last time that their needs will move into Python the language. Python encourages such domain-specific behaviour. In fact, that's what operator-overloading is all about: classes can define what any operator means for *them*. There's no requirement that the infinity of potential classes must all define operators in a mutually compatible fashion, not even for comparison operators. For example, consider a class implementing one particular version of three-value logic. It isn't enough for == to only return True or False, because you also need Maybe: True == False = returns False True == True = returns True True == Maybe = returns Maybe etc. Or consider fuzzy logic, where instead of two truth values, you have a continuum of truth values between 0.0 and 1.0. What should comparing two such fuzzy values for equality return? A boolean True/False? Another fuzzy value? Another one from the Zen: Special cases aren't special enough to break the rules. The rules are that classes can customize their behaviour, that methods can fail, and that Python should not try to guess what the correct value should have been in the event of such a failure. Equality is a special case, but it isn't so special that it needs to be an exception from those rules. If you really need a guaranteed-can't-fail[1] equality test, try something like this untested wrapper class: class EqualityWrapper(object): def __init__(self, obj): self.wrapped = obj def __eq__(self, other): try: return bool(self.wrapped == other) except Exception: return False # or maybe True? Now wrap all your data: data = [a list of arbitrary objects] data = map(EqualityWrapper, data) process(data) [1] Not a guarantee. Well, lots to think about. Just to keep you from shooting at straw men: I would have liked it to be part of the design contract (a convention, if you like) that 1) bool(x == y) should return a boolean and never throw an error 2) x == x return True I do *not* say that bool(x) should never throw an error. I do *not* say that Python should guess a return value if an __eq__ function throws an error, only that it should have been considered a bug, or at least bad form, for __eq__ functions to do so. What might be a sensible behaviour (unlike your proposed wrapper) would be the following: def eq(x, y): if x is y: return True else: try: return (x == y) except Exception: return False If is is possible to change the language, how about having two diferent functions, one for overloading the '==' operator, and another for testing list and set membership, dictionary key identity, etc.? For instance like this - Add a new function __equals__; x.__equals__(y) could default to bool(x.__eq__(y)) - Estalish by convention that x.__equals__(y) must return a boolean and may not intentionally throw an error. - Establish by convention that 'x is y' implies 'x.__equals__(y)' in the sense that (not (x is y and not x.__equals__(y)) must always hold - Have the Python data structures call __equals__ when they want to compare objects internally (e.g. for 'x in alist', 'x in adict', 'set(alist)', etc. - Provide an equals(x,y) built-in that calls the __equals__ function - numpy and others who (mis)use '==' for their own purposes could use def __equals__(self, other): return (self is other) For the float NaN case it looks like things are already behaving like this. For numpy objects you would not lose anything, since 'numpyArray in alist' is broken
Re: Rich Comparisons Gotcha
On Dec 7, 4:23 pm, Rasmus Fogh [EMAIL PROTECTED] wrote: If is is possible to change the language, how about having two diferent functions, one for overloading the '==' operator, and another for testing list and set membership, dictionary key identity, etc.? I've often thought that this would have made a lot of sense too, though I'd probably choose to spell the well-behaved structural equality == and the flexible numeric equality eq (a la Fortran). Hey, we could have *six* new keywords: eq, ne, le, lt, ge, gt! See the recent (September?) thread Comparing float and decimal for some of the fun that results from lack of transitivity of equality. But I think there's essentially no chance of Python changing to support this. And even if there were, Python's conflation of structural equality with numeric equality brings significant benefits in terms of readability of code, ease of learning, and general friendliness; it's only really troublesome in a few corner cases. Is the tradeoff worth it? So for me, this comes down to a case of 'practicality beats purity'. Mark -- http://mail.python.org/mailman/listinfo/python-list
Re: Rich Comparisons Gotcha
Rasmus Fogh wrote: Current behaviour is both inconsistent and counterintuitive, as these examples show. x = float('NaN') x == x False Blame IEEE for that one. Rich comparisons have nothing to do with that one. ll = [x] x in ll True x == ll[0] False import numpy y = numpy.zeros((3,)) y array([ 0., 0., 0.]) bool(y==y) Traceback (most recent call last): File stdin, line 1, in module ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all() ll1 = [y,1] y in ll1 True ll2 = [1,y] y in ll2 Traceback (most recent call last): File stdin, line 1, in module ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all() Can anybody see a way this could be fixed (please)? I may well have to live with it, but I would really prefer not to. Make a concrete proposal for fixing it that does not break backwards compatibility. -- Robert Kern I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth. -- Umberto Eco -- http://mail.python.org/mailman/listinfo/python-list
Re: Rich Comparisons Gotcha
Luis Zarrabeitia wrote: Quoting James Stroud [EMAIL PROTECTED]: First, here is why the ability to throw an error is a feature: class Apple(object): def __init__(self, appleness): self.appleness = appleness def __cmp__(self, other): assert isinstance(other, Apple), 'must compare apples to apples' return cmp(self.appleness, other.appleness) class Orange(object): pass Apple(42) == Orange() I beg to disagree. The right answer for the question Am I equal to this chair right here? is not I don't know, nor I can't compare. The answer is No, I'm not a chair, thus I'm not equal to this chair right here. If someone comes to my house, looking for me, he will not run away because he sees a chair before he sees me. Your assert doesn't belong inside the methot, it should be up to the caller to decide if the human-chair comparisons make sense or not. I certainly don't want to be type-checking when looking for an object within a mixed-type collection. This reminds me of complex numbers: would 4 + 4i be equal to sqrt(32)? I assume you meant sqrt(32i). No, I definitely didn't mean sqrt(32i). I'm using sqrt() to represent the mathematical square root, and not an arbitrary function one might define, by the way. My point is that 4 + 4i, sqrt(32), and sqrt(-32) all exist in different spaces. They are not comparable, even when testing for equality in a pure mathematical sense. If when encounter these values in our programs, we might like the power to decide the results of these comparisons. In one context it might make sense to throw an exception, in another, it might make sense to return False based on the fact that we consider them different types, in yet another context, it might make sense to look at complex plane values as vectors and return their scalar magnitude for comparison to real numbers. I think this ability to define the results of comparisons is not a shortcoming of the language but a strength. -- James Stroud UCLA-DOE Institute for Genomics and Proteomics Box 951570 Los Angeles, CA 90095 http://www.jamesstroud.com -- http://mail.python.org/mailman/listinfo/python-list
Re: Rich Comparisons Gotcha
Rasmus Fogh wrote: Current behaviour is both inconsistent and counterintuitive, as these examples show. x = float('NaN') x == x False Perhaps this should raise an exception? I think the problem is not with comparisons in general but with the fact that nan is type float: py type(float('NaN')) type 'float' No float can be equal to nan, but nan is a float. How can something be not a number and a float at the same time? The illogicality of nan's type creates the possibility for the illogical results of comparisons to nan including comparing nan to itself. ll = [x] x in ll True x == ll[0] False But there is consistency on the basis of identity which is the test for containment (in): py x is x True py x in [x] True Identity and equality are two different concepts. Comparing identity to equality is like comparing apples to oranges ;o) import numpy y = numpy.zeros((3,)) y array([ 0., 0., 0.]) bool(y==y) Traceback (most recent call last): File stdin, line 1, in module ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all() But the equality test is not what fails here. It's the cast to bool that fails, which for numpy works like a unary ufunc. The designers of numpy thought that this would be a more desirable behavior. The test for equality likewise is a binary ufunc and the behavior was chosen in numpy for practical reasons. I don't know if you can overload the == operator in C, but if you can, you would be able to achieve the same behavior. ll1 = [y,1] y in ll1 True ll2 = [1,y] y in ll2 Traceback (most recent call last): File stdin, line 1, in module ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all() I think you could be safe calling this a bug with numpy. But the fact that someone can create a bug with a language is not a condemnation of the language. For example, C makes it real easy to crash a program by overrunning the limits of an array, but no one would suggest to remove arrays from C. Can anybody see a way this could be fixed (please)? I may well have to live with it, but I would really prefer not to. Your only hope is to somehow convince the language designers to remove the ability to overload == then get them to agree on what you think the proper behavior should be for comparisons. I think the probability of that happening is about zero, though, because such a change would run counter to the dynamic nature of the language. James -- James Stroud UCLA-DOE Institute for Genomics and Proteomics Box 951570 Los Angeles, CA 90095 http://www.jamesstroud.com -- http://mail.python.org/mailman/listinfo/python-list
Re: Rich Comparisons Gotcha
Robert Kern wrote: Terry Reedy wrote: Rasmus Fogh wrote: Personally I would like to get these [EMAIL PROTECTED]* misfeatures removed, What you are calling a misfeature is an absence, not a presence that can be removed. That's not quite true. In what way, pray tell. My statement still looks quite true to me. Rich comparisons explicitly allow non-boolean return values. They do so by not doing anything to the return value of the underlying method. As I said, the OP is complaining about an absence of a check. Moreover, the absence is intentional as I explained in the part snipped and as you further explained. And if the return value was bad, all operator.eq could do is raise and exception anyway. Sure, but then it would be a bug to return a non-boolean from __eq__ and friends. It is not a bug today. I think that's what Rasmus is proposing. Right, the addition of a check that is absent today. tjr -- http://mail.python.org/mailman/listinfo/python-list
Re: Rich Comparisons Gotcha
James Stroud wrote: [cast to bool] for numpy works like a unary ufunc. Scratch that. Not thinking and typing at same time. -- James Stroud UCLA-DOE Institute for Genomics and Proteomics Box 951570 Los Angeles, CA 90095 http://www.jamesstroud.com -- http://mail.python.org/mailman/listinfo/python-list
Re: Rich Comparisons Gotcha
Rasmus Fogh wrote: Can anybody see a way this could be fixed (please)? I may well have to live with it, but I would really prefer not to. I made a suggestion in my first response, which perhaps you missed. tjr -- http://mail.python.org/mailman/listinfo/python-list
Re: Rich Comparisons Gotcha
On Sun, 07 Dec 2008 15:32:53 -0600, Robert Kern wrote: Rasmus Fogh wrote: Current behaviour is both inconsistent and counterintuitive, as these examples show. x = float('NaN') x == x False Blame IEEE for that one. Rich comparisons have nothing to do with that one. There is nothing to blame them for. This is the correct behaviour. NaNs should *not* compare equal to themselves, that's mathematically incoherent. -- Steven -- http://mail.python.org/mailman/listinfo/python-list
Re: Re: Rich Comparisons Gotcha
James Stroud wrote: div class=moz-text-flowed style=font-family: -moz-fixedRasmus Fogh wrote: Current behaviour is both inconsistent and counterintuitive, as these examples show. x = float('NaN') x == x False Perhaps this should raise an exception? I think the problem is not with comparisons in general but with the fact that nan is type float: py type(float('NaN')) type 'float' No float can be equal to nan, but nan is a float. How can something be not a number and a float at the same time? The illogicality of nan's type creates the possibility for the illogical results of comparisons to nan including comparing nan to itself. I initially thought that looked like a bug to me. But, this is apparently standard behavior required for NaN. I'm only using Wikipedia as a reference here, but about 80% of the way down, under standard operations: http://en.wikipedia.org/wiki/IEEE_754-1985 Comparison operations. NaN is treated specially in that NaN=NaN always returns false. Presumably since floating point calculations return NaN for some operations, and one Nan is usually not equal to another, this is the required behavior. So not a Python issue (though understandably a bit confusing). The array issue seems to be with one 3rd party library, and one can choose to use or not use their library, to ask them to change it, or even to decide to override their == operator, if one doesn't like the way it is designed. -- http://mail.python.org/mailman/listinfo/python-list
Re: Rich Comparisons Gotcha
On Sun, 07 Dec 2008 23:20:12 +, Steven D'Aprano wrote: On Sun, 07 Dec 2008 15:32:53 -0600, Robert Kern wrote: Rasmus Fogh wrote: Current behaviour is both inconsistent and counterintuitive, as these examples show. x = float('NaN') x == x False Blame IEEE for that one. Rich comparisons have nothing to do with that one. There is nothing to blame them for. This is the correct behaviour. NaNs should *not* compare equal to themselves, that's mathematically incoherent. Sorry, I should explain why. Given: x = log(-5) # a NaN y = log(-2) # the same NaN x == y # Some people want this to be true for NaNs. Then: # Compare x and y directly. log(-5) == log(-2) # If x == y then exp(x) == exp(y) for all x, y. exp(log(-5)) == exp(log(-2)) -5 == -2 and now the entire foundations of mathematics collapses into a steaming pile of rubble. -- Steven -- http://mail.python.org/mailman/listinfo/python-list
Re: Rich Comparisons Gotcha
On Sun, 07 Dec 2008 13:57:54 -0800, James Stroud wrote: Rasmus Fogh wrote: Current behaviour is both inconsistent and counterintuitive, as these examples show. x = float('NaN') x == x False Perhaps this should raise an exception? Why on earth would you want checking equality on NaN to raise an exception??? What benefit does it give? I think the problem is not with comparisons in general but with the fact that nan is type float: py type(float('NaN')) type 'float' No float can be equal to nan, but nan is a float. How can something be not a number and a float at the same time? Because floats are not real numbers. They are *almost* numbers, they often (but not always) behave like numbers, but they're actually not numbers. The difference is subtle enough that it is easy to forget that floats are not numbers, but it's easy enough to find examples proving it: Some perfectly good numbers don't exist as floats: 2**-1 == 0.0 True Try as you might, you can't get the number 0.1 *exactly* as a float: 0.1 0.10001 For any numbers x and y not equal to zero, x+y != x. But that fails for floats: 1001.0 + 1e99 == 1e99 True The above is because of overflow. But even avoiding overflow doesn't solve the problem. With a little effort, you can also find examples of ordinary sized floats where (x+y)-y != x. 0.9+0.1-0.9 == 0.1 False import numpy y = numpy.zeros((3,)) y array([ 0., 0., 0.]) bool(y==y) Traceback (most recent call last): File stdin, line 1, in module ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all() But the equality test is not what fails here. It's the cast to bool that fails And it is right to do so, because it is ambiguous and the library designers rightly avoided the temptation of guessing what result is needed. ll1 = [y,1] y in ll1 True ll2 = [1,y] y in ll2 Traceback (most recent call last): File stdin, line 1, in module ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all() I think you could be safe calling this a bug with numpy. Only in the sense that there are special cases where the array elements are all true, or all false, and numpy *could* safely return a bool. But special cases are not special enough to break the rules. Better for the numpy caller to write this: a.all() # or any() instead of: try: bool(a) except ValueError: a.all() as they would need to do if numpy sometimes returned a bool and sometimes raised an exception. -- Steven -- http://mail.python.org/mailman/listinfo/python-list
Re: Rich Comparisons Gotcha
Steven D'Aprano wrote: On Sun, 07 Dec 2008 23:20:12 +, Steven D'Aprano wrote: On Sun, 07 Dec 2008 15:32:53 -0600, Robert Kern wrote: Rasmus Fogh wrote: Current behaviour is both inconsistent and counterintuitive, as these examples show. x = float('NaN') x == x False Blame IEEE for that one. Rich comparisons have nothing to do with that one. There is nothing to blame them for. This is the correct behaviour. NaNs should *not* compare equal to themselves, that's mathematically incoherent. Sorry, I should explain why. Given: x = log(-5) # a NaN y = log(-2) # the same NaN x == y # Some people want this to be true for NaNs. Then: # Compare x and y directly. log(-5) == log(-2) # If x == y then exp(x) == exp(y) for all x, y. exp(log(-5)) == exp(log(-2)) -5 == -2 and now the entire foundations of mathematics collapses into a steaming pile of rubble. I didn't mean to suggest that it was incorrect, just that that particular surprising behavior is not related to rich comparisons. Even if the OP gets an __equals__() or some such, NaN will still not compare equal to NaN. -- Robert Kern I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth. -- Umberto Eco -- http://mail.python.org/mailman/listinfo/python-list
Re: Rich Comparisons Gotcha
On Dec 7, 6:37 pm, Steven D'Aprano [EMAIL PROTECTED] cybersource.com.au wrote: On Sun, 07 Dec 2008 23:20:12 +, Steven D'Aprano wrote: On Sun, 07 Dec 2008 15:32:53 -0600, Robert Kern wrote: Rasmus Fogh wrote: Current behaviour is both inconsistent and counterintuitive, as these examples show. x = float('NaN') x == x False Blame IEEE for that one. Rich comparisons have nothing to do with that one. There is nothing to blame them for. This is the correct behaviour. NaNs should *not* compare equal to themselves, that's mathematically incoherent. Sorry, I should explain why. Given: x = log(-5) # a NaN y = log(-2) # the same NaN x == y # Some people want this to be true for NaNs. Then: # Compare x and y directly. log(-5) == log(-2) # If x == y then exp(x) == exp(y) for all x, y. exp(log(-5)) == exp(log(-2)) -5 == -2 and now the entire foundations of mathematics collapses into a steaming pile of rubble. And why doesn't this happen with the current behavior if x = y = log (-5) ? According to the same proof, -5 != -5. George -- http://mail.python.org/mailman/listinfo/python-list
Re: Rich Comparisons Gotcha
Terry Reedy wrote: Robert Kern wrote: Terry Reedy wrote: Rasmus Fogh wrote: Personally I would like to get these [EMAIL PROTECTED]* misfeatures removed, What you are calling a misfeature is an absence, not a presence that can be removed. That's not quite true. In what way, pray tell. My statement still looks quite true to me. There is an explicit policy that __eq__() methods can return non-bools for various purposes. I consider that policy to a presence that can be removed. There is no check because that policy exists, not the other way around. Anyways, this is really a semantic digression, and not particularly important. Peace? -- Robert Kern I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth. -- Umberto Eco -- http://mail.python.org/mailman/listinfo/python-list
Re: Rich Comparisons Gotcha
Steven D'Aprano wrote: On Sun, 07 Dec 2008 13:57:54 -0800, James Stroud wrote: Rasmus Fogh wrote: ll1 = [y,1] y in ll1 True ll2 = [1,y] y in ll2 Traceback (most recent call last): File stdin, line 1, in module ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all() I think you could be safe calling this a bug with numpy. Only in the sense that there are special cases where the array elements are all true, or all false, and numpy *could* safely return a bool. But special cases are not special enough to break the rules. Better for the numpy caller to write this: a.all() # or any() instead of: try: bool(a) except ValueError: a.all() as they would need to do if numpy sometimes returned a bool and sometimes raised an exception. I'm missing how a.all() solves the problem Rasmus describes, namely that the order of a python *list* affects the results of containment tests by numpy.array. E.g. y in ll1 and y in ll2 evaluate to different results in his example. It still seems like a bug in numpy to me, even if too much other stuff is broken if you fix it (in which case it apparently becomes an issue). James -- http://mail.python.org/mailman/listinfo/python-list
Re: Rich Comparisons Gotcha
James Stroud wrote: Steven D'Aprano wrote: On Sun, 07 Dec 2008 13:57:54 -0800, James Stroud wrote: Rasmus Fogh wrote: ll1 = [y,1] y in ll1 True ll2 = [1,y] y in ll2 Traceback (most recent call last): File stdin, line 1, in module ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all() I think you could be safe calling this a bug with numpy. Only in the sense that there are special cases where the array elements are all true, or all false, and numpy *could* safely return a bool. But special cases are not special enough to break the rules. Better for the numpy caller to write this: a.all() # or any() instead of: try: bool(a) except ValueError: a.all() as they would need to do if numpy sometimes returned a bool and sometimes raised an exception. I'm missing how a.all() solves the problem Rasmus describes, namely that the order of a python *list* affects the results of containment tests by numpy.array. E.g. y in ll1 and y in ll2 evaluate to different results in his example. It still seems like a bug in numpy to me, even if too much other stuff is broken if you fix it (in which case it apparently becomes an issue). It's an issue, if anything, not a bug. There is no consistent implementation of bool(some_array) that works in all cases. numpy's predecessor Numeric used to implement this as returning True if at least one element was non-zero. This works well for bool(x!=y) (which is equivalent to (x!=y).any()) but does not work well for bool(x==y) (which should be (x==y).all()), but many people got confused and thought that bool(x==y) worked. When we made numpy, we decided to explicitly not allow bool(some_array) so that people will not write buggy code like this again. The deficiency is in the feature of rich comparisons, not numpy's implementation of it. __eq__() is allowed to return non-booleans; however, there are some parts of Python's implementation like list.__contains__() that still expect the return value of __eq__() to be meaningfully cast to a boolean. -- Robert Kern I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth. -- Umberto Eco -- http://mail.python.org/mailman/listinfo/python-list
Re: Rich Comparisons Gotcha
Robert Kern wrote: James Stroud wrote: I'm missing how a.all() solves the problem Rasmus describes, namely that the order of a python *list* affects the results of containment tests by numpy.array. E.g. y in ll1 and y in ll2 evaluate to different results in his example. It still seems like a bug in numpy to me, even if too much other stuff is broken if you fix it (in which case it apparently becomes an issue). It's an issue, if anything, not a bug. There is no consistent implementation of bool(some_array) that works in all cases. numpy's predecessor Numeric used to implement this as returning True if at least one element was non-zero. This works well for bool(x!=y) (which is equivalent to (x!=y).any()) but does not work well for bool(x==y) (which should be (x==y).all()), but many people got confused and thought that bool(x==y) worked. When we made numpy, we decided to explicitly not allow bool(some_array) so that people will not write buggy code like this again. The deficiency is in the feature of rich comparisons, not numpy's implementation of it. __eq__() is allowed to return non-booleans; however, there are some parts of Python's implementation like list.__contains__() that still expect the return value of __eq__() to be meaningfully cast to a boolean. You have explained py 112 = [1, y] py y in 112 Traceback (most recent call last): File stdin, line 1, in module ValueError: The truth value of an array with more than one element is... but not py ll1 = [y,1] py y in ll1 True It's this discrepancy that seems like a bug, not that a ValueError is raised in the former case, which is perfectly reasonable to me. All I can imagine is that something like the following lives in the bowels of the python code for list: def __contains__(self, other): foundit = False for i, v in enumerate(self): if i == 0: # evaluates to bool numpy array foundit = one_kind_of_test(v, other) else: # raises exception for numpy array foundit = another_kind_of_test(v, other) if foundit: break return foundit I'm trying to imagine some other way to get the results mentioned but I honestly can't. It's beyond me why someone would do such a thing, but perhaps it's an optimization of some sort. James -- http://mail.python.org/mailman/listinfo/python-list
Re: Rich Comparisons Gotcha
On Sun, 07 Dec 2008 16:23:59 +, Rasmus Fogh wrote: Just to keep you from shooting at straw men: I would have liked it to be part of the design contract (a convention, if you like) that 1) bool(x == y) should return a boolean and never throw an error Can't be done without making bool a magic function. If x==y raises an exception, bool() won't even be called. The only way around that would be for the Python compiler to recognise bool(x=y) and perform special magic. What if you did this? trueorfalse = bool # I don't like George Boole trueoffalse( [x][0].__class__.__getattr__('__dict__')['__eq__'](y) ) Should that have special magic performed too? Just how much work must the compiler put in to special-casing bool? 2) x == x return True Which goes against the IEEE 754 floating-point standard. http://grouper.ieee.org/groups/754/ Python used to optimize x==x and always return True. This was removed because it caused problems. I do *not* say that bool(x) should never throw an error. I do *not* say that Python should guess a return value if an __eq__ function throws an error, But to get what you want, the above is implied. I suppose, just barely, that you could avoid making bool() magic and just make if magic. When the compiler sees if expr: it could swallow all exceptions inside expr and force it to evaluate to True or False. (How? By guessing? Randomly?) This would cause many problems, but it could be done, and much easier than ensuring that bool(x) always succeeds. only that it should have been considered a bug, or at least bad form, for __eq__ functions to do so. It's certainly *unusual* for comparisons to return non-bools, but it's not bad form. What might be a sensible behaviour (unlike your proposed wrapper) What do you dislike about my wrapper class? Perhaps it is fixable. would be the following: def eq(x, y): if x is y: return True I've already mentioned NaNs. Sentinel values also sometimes need to compare not equal with themselves. Forcing them to compare equal will cause breakage. else: try: return (x == y) except Exception: return False Why False? Why not True? If an error occurs inside __eq__, how do you know that the correct result was False? class Broken(object): def __eq__(self, other): return Treu # oops, raises NameError -- Steven -- http://mail.python.org/mailman/listinfo/python-list
Re: Rich Comparisons Gotcha
James Stroud wrote: Robert Kern wrote: James Stroud wrote: I'm missing how a.all() solves the problem Rasmus describes, namely that the order of a python *list* affects the results of containment tests by numpy.array. E.g. y in ll1 and y in ll2 evaluate to different results in his example. It still seems like a bug in numpy to me, even if too much other stuff is broken if you fix it (in which case it apparently becomes an issue). It's an issue, if anything, not a bug. There is no consistent implementation of bool(some_array) that works in all cases. numpy's predecessor Numeric used to implement this as returning True if at least one element was non-zero. This works well for bool(x!=y) (which is equivalent to (x!=y).any()) but does not work well for bool(x==y) (which should be (x==y).all()), but many people got confused and thought that bool(x==y) worked. When we made numpy, we decided to explicitly not allow bool(some_array) so that people will not write buggy code like this again. The deficiency is in the feature of rich comparisons, not numpy's implementation of it. __eq__() is allowed to return non-booleans; however, there are some parts of Python's implementation like list.__contains__() that still expect the return value of __eq__() to be meaningfully cast to a boolean. You have explained py 112 = [1, y] py y in 112 Traceback (most recent call last): File stdin, line 1, in module ValueError: The truth value of an array with more than one element is... but not py ll1 = [y,1] py y in ll1 True It's this discrepancy that seems like a bug, not that a ValueError is raised in the former case, which is perfectly reasonable to me. Nothing to do with numpy. list.__contains__() checks for identity with is before it goes to __eq__(). -- Robert Kern I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth. -- Umberto Eco -- http://mail.python.org/mailman/listinfo/python-list
Re: Rich Comparisons Gotcha
Robert Kern wrote: James Stroud wrote: py 112 = [1, y] py y in 112 Traceback (most recent call last): File stdin, line 1, in module ValueError: The truth value of an array with more than one element is... but not py ll1 = [y,1] py y in ll1 True It's this discrepancy that seems like a bug, not that a ValueError is raised in the former case, which is perfectly reasonable to me. Nothing to do with numpy. list.__contains__() checks for identity with is before it goes to __eq__(). ...but only for the first element of the list: py import numpy py y = numpy.array([1,2,3]) py y array([1, 2, 3]) py y in [1, y] Traceback (most recent call last): File ipython console, line 1, in module type 'exceptions.ValueError': The truth value of an array with more than one element is ambiguous. Use a.any() or a.all() py y is [1, y][1] True I think it skips straight to __eq__ if the element is not the first in the list. That no one acknowledges this makes me feel like a conspiracy is afoot. -- http://mail.python.org/mailman/listinfo/python-list
Re: Rich Comparisons Gotcha
James Stroud wrote: Robert Kern wrote: James Stroud wrote: py 112 = [1, y] py y in 112 Traceback (most recent call last): File stdin, line 1, in module ValueError: The truth value of an array with more than one element is... but not py ll1 = [y,1] py y in ll1 True It's this discrepancy that seems like a bug, not that a ValueError is raised in the former case, which is perfectly reasonable to me. Nothing to do with numpy. list.__contains__() checks for identity with is before it goes to __eq__(). ...but only for the first element of the list: py import numpy py y = numpy.array([1,2,3]) py y array([1, 2, 3]) py y in [1, y] Traceback (most recent call last): File ipython console, line 1, in module type 'exceptions.ValueError': The truth value of an array with more than one element is ambiguous. Use a.any() or a.all() py y is [1, y][1] True I think it skips straight to __eq__ if the element is not the first in the list. No, it doesn't skip straight to __eq__(). y is 1 returns False, so (y==1) is checked. When y is a numpy array, this returns an array of bools. list.__contains__() tries to convert this array to a bool and ndarray.__nonzero__() raises the exception. list.__contains__() checks is then __eq__() for each element before moving on to the next element. It does not try is for all elements, then try __eq__() for all elements. That no one acknowledges this makes me feel like a conspiracy is afoot. I don't know what you think I'm not acknowledging. -- Robert Kern I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth. -- Umberto Eco -- http://mail.python.org/mailman/listinfo/python-list
Re: Rich Comparisons Gotcha
Rasmus Fogh wrote: Dear All, For the first time I have come across a Python feature that seems completely wrong. After the introduction of rich comparisons, equality comparison does not have to return a truth value, and may indeed return nothing at all and throw an error instead. As a result, code like if foo == bar: or foo in alist cannot be relied on to work. This is clearly no accident. According to the documentation all comparison operators are allowed to return non-booleans, or to throw errors. There is explicitly no guarantee that x == x is True. You have touched on a real and known issue that accompanies dynamic typing and the design of Python. *Every* Python function can return any Python object and may raise any exception either actively, by design, or passively, by not catching exceptions raised in the functions *it* calls. Personally I would like to get these [EMAIL PROTECTED]* misfeatures removed, What you are calling a misfeature is an absence, not a presence that can be removed. and constrain the __eq__ function to always return a truth value. It is impossible to do that with certainty by any mechanical creation-time checking. So the implementation of operator.eq would have to check the return value of the ob.__eq__ function it calls *every time*. That would slow down the speed of the 99.xx% of cases where the check is not needed and would still not prevent exceptions. And if the return value was bad, all operator.eq could do is raise and exception anyway. That is clearly not likely to happen. Unless I have misunderstood something, could somebody explain to me. a. See above. b. Python programmers are allowed to define 'weird' but possibly useful-in-context behaviors, such as try out 3-value logic, or to operate on collections element by element (as with numpy). 1) Why was this introduced? The 6 comparisons were previously done with one __cmp__ function that was supposed to return -1, 0, or 1 and which worked with negative, 0, or positive response, but which could return anything or raise an exception. The compare functions could mask but not prevent weird returns. I can understand relaxing the restrictions on '', '=' etc. - after all you cannot define an ordering for all types of object. But surely you can define an equal/unequal classification for all types of object, if you want to? Is it just the numpy people wanting to type 'a == b' instead of 'equals(a,b)', or is there a better reason? 2) If I want to write generic code, can I somehow work around the fact that if foo == bar: or foo in alist does not work for arbitrary objects? Every Python function is 'generic' unless restrained by type tests. However, even 'generic' functions can only work as expected with objects that meet the assumptions embodied in the function. In my Python-based algorithm book-in-progess, I am stating this explicitly. In particular, I say taht the book only applies to objects for which '==' gives a boolean result that is reflexive, symmetric, and transitive. This exludes float('nan'), for instance (as I see you discovered), which follows the IEEE mandate to act otherwise. CCPN has a table display class that maintains a list of arbitrary objects, one per line in the table. The table class is completely generic, but only for the objects that meet the implied assumption. This is true for *all* Python code. If you want to apply the function to other objects, you must either adapt the function or adapt or wrap the objects to give them an interface that does meet the assumptions. and subclassed for individual cases. It contains the code: if foo in tbllist: ... else: ... tbllist.append(foo) ... One day the 'if' statement gave this rather obscure error: ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all() A subclass had used objects passed in from some third party code, and as it turned out foo happened to be a tuple containing a tuple containing a numpy array. Right. 'in' calls '==' and assumes a boolean return. Assumption violated, exception raised. Completely normal. The error message even suggests a solution: wrap the offending objects in an adaptor class that gives them a normal interface with .all (or perhaps the all() builtin). Terry Jan Reedy -- http://mail.python.org/mailman/listinfo/python-list
Re: Rich Comparisons Gotcha
Terry Reedy wrote: Rasmus Fogh wrote: Dear All, For the first time I have come across a Python feature that seems completely wrong. After the introduction of rich comparisons, equality comparison does not have to return a truth value, and may indeed return nothing at all and throw an error instead. As a result, code like if foo == bar: or foo in alist cannot be relied on to work. This is clearly no accident. According to the documentation all comparison operators are allowed to return non-booleans, or to throw errors. There is explicitly no guarantee that x == x is True. You have touched on a real and known issue that accompanies dynamic typing and the design of Python. *Every* Python function can return any Python object and may raise any exception either actively, by design, or passively, by not catching exceptions raised in the functions *it* calls. Personally I would like to get these [EMAIL PROTECTED]* misfeatures removed, What you are calling a misfeature is an absence, not a presence that can be removed. That's not quite true. Rich comparisons explicitly allow non-boolean return values. Breaking up __cmp__ into multiple __special__ methods was not the sole purpose of rich comparisons. One of the prime examples at the time was numpy (well, Numeric at the time). We wanted to use == to be able to return an array with boolean values where the two operand arrays were equal. E.g. In [1]: from numpy import * In [2]: array([1, 2, 3]) == array([4, 2, 3]) Out[2]: array([False, True, True], dtype=bool) SQLAlchemy uses these operators to build up objects that will be turned into SQL expressions. print users.c.id==addresses.c.user_id users.id = addresses.user_id Basically, the idea was to turn these operators into full-fledged operators like +-/*. Returning a non-boolean violates neither the letter, nor the spirit of the feature. Unfortunately, if you do overload __eq__ to build up expressions or whatnot, the other places where users of __eq__ are implicitly expecting a boolean break. While I was (and am) a supporter of rich comparisons, I feel Rasmus's pain from time to time. It would be nice to have an alternate method to express the boolean yes, this thing is equal in value to that other thing. Unfortunately, I haven't figured out a good way to fit it in now without sacrificing rich comparisons entirely. and constrain the __eq__ function to always return a truth value. It is impossible to do that with certainty by any mechanical creation-time checking. So the implementation of operator.eq would have to check the return value of the ob.__eq__ function it calls *every time*. That would slow down the speed of the 99.xx% of cases where the check is not needed and would still not prevent exceptions. And if the return value was bad, all operator.eq could do is raise and exception anyway. Sure, but then it would be a bug to return a non-boolean from __eq__ and friends. It is not a bug today. I think that's what Rasmus is proposing. -- Robert Kern I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth. -- Umberto Eco -- http://mail.python.org/mailman/listinfo/python-list