Steven D'Aprano <st...@remove-this-cybersource.com.au> wrote: > There is nothing to blame them for. This is the correct behaviour. NaNs > should *not* compare equal to themselves, that's mathematically > incoherent.
Indeed. The problem is a paucity of equality predicates. This is hardly surprising: Common Lisp has four general-purpose equality predicates (EQ, EQL, EQUAL and EQUALP), and many more type-specific ones (=, STRING=, STRING-EQUAL (yes, I know...), CHAR=, ...), and still doesn't really have enough. For example, EQUAL compares strings case-sensitively, but other arrays are compared by address; EQUALP will recurse into arbitrary arrays, but compares strings case-insensitively... For the purposes of this discussion, however, it has enough to be able to distinguish between * numerical comparisons, which (as you explain later) should /not/ claim that two NaNs are equal, and * object comparisons, which clearly must declare an object equal to itself. For example, I had the following edifying conversation with SBCL. CL-USER> ;; Return NaNs rather than signalling errors. (sb-int:set-floating-point-modes :traps nil) ; No value CL-USER> (defconstant nan (/ 0.0 0.0)) NAN CL-USER> (loop for func in '(eql equal equalp =) collect (list func (funcall func nan nan))) ((EQL T) (EQUAL T) (EQUALP T) (= NIL)) CL-USER> That is, a NaN is EQL, EQUAL and EQUALP to itself, but not = to itself. (Due to the vagaries of EQ, a NaN might or might not be EQ to itself or other NaNs.) Python has a much more limited selection of equality predicates -- in fact, just == and is. The is operator is Python's equivalent of Lisp's EQ predicate: it compares objects by address. I can have a similar chat with Python. In [12]: nan = float('nan') In [13]: nan is nan Out[13]: True In [14]: nan == nan Out[14]: False In [16]: nan is float('nan') Out[16]: False Python numbers are the same as themselves reliably, unlike in Lisp. But there's no sensible way of asking whether something is `basically the same as' nan, like Lisp's EQL or EQUAL. I agree that the primary equality predicate for numbers must be the numerical comparison, and NaNs can't (sensibly) be numerically equal to themselves. Address comparisons are great when you're dealing with singletons, or when you carefully intern your objects. In other cases, you're left with ==. This puts a great deal of responsibility on the programmer of an == method to weigh carefully the potentially conflicting demands of compatibility (many other libraries just expect == to be an equality operator returning a straightforward truth value, and given that there isn't a separate dedicated equality operator, this isn't unreasonable), and doing something more domain-specifically useful. It's worth pointing out that numpy isn't unique in having == not return a straightforward truth value. The SAGE computer algebra system (and sympy, I believe) implement the == operator on algebraic formulae so as to construct equations. For example, the following is syntactically and semantically Python, with fancy libraries. sage: var('x') # x is now a variable x sage: solve(x**2 + 2*x - 4 == 1) [x == -sqrt(6) - 1, x == sqrt(6) - 1] (SAGE has some syntactic tweaks, such as ^ meaning the same as **, but I didn't use them.) I think this is an excellent use of the == operator -- but it does have some potential to interfere with other libraries which make assumptions about how == behaves. The SAGE developers have been clever here, though: sage: 2*x + 1 == (2 + 4*x)/2 2*x + 1 == (4*x + 2)/2 sage: bool(2*x + 1 == (2 + 4*x)/2) True sage: bool(2*x + 1 == (2 + 4*x)/3) False I think Python manages surprisingly well with its limited equality predicates. But the keyword there is `surprisingly' -- and it may not continue this trick forever. -- [mdw] -- http://mail.python.org/mailman/listinfo/python-list