[Python-Dev] Re: Are PyObject_RichCompareBool shortcuts part of Python or just CPython quirks?

Raymond Hettinger Mon, 03 Feb 2020 00:41:39 -0800

> PyObject_RichCompareBool(x, y, op) has a (valuable!) shortcut: if x and y are 
> the same object, then equality comparison returns True and inequality False. 
> No attempt is made to execute __eq__ or __ne__ methods in those cases.
> 
> This has visible consequences all over the place, but they don't appear to be 
> documented. For example,
> 
> ...
> despite that math.nan == math.nan is False.
> 
> It's usually clear which methods will be called, and when, but not really 
> here. Any _context_ that calls PyObject_RichCompareBool() under the covers, 
> for an equality or inequality test, may or may not invoke __eq__ or __ne__, 
> depending on whether the comparands are the same object. Also any context 
> that inlines these special cases to avoid the overhead of calling 
> PyObject_RichCompareBool() at all.
> 
> If it's intended that Python-the-language requires this, that needs to be 
> documented.


This has been slowly, but perhaps incompletely documented over the years and 
has become baked in the some of the collections ABCs as well.  For example, 
Sequence.__contains__() is defined as:

    def __contains__(self, value):
        for v in self:
            if v is value or v == value:          # note the identity test
                return True
        return False

Various collections need to assume reflexivity, not just for speed, but so that 
we can reason about them and so that they can maintain internal consistency. 
For example, MutableSet defines pop() as:

    def pop(self):
        """Return the popped value.  Raise KeyError if empty."""
        it = iter(self)
        try:
            value = next(it)
        except StopIteration:
            raise KeyError from None
        self.discard(value)
        return value

That pop() logic implicitly assumes an invariant between membership and 
iteration:

       assert(x in collection for x in collection)

We really don't want to pop() a value *x* and then find that *x* is still in 
the container.   This would happen if iter() found the *x*, but discard() 
couldn't find the object because the object can't or won't recognize itself:

     s = {float('NaN')}
     s.pop()
     assert not s                  # Do we want the language to guarantee that 
s is now empty?  I think we must.

The code for clear() depends on pop() working:

    def clear(self):
        """This is slow (creates N new iterators!) but effective."""
        try:
            while True:
                self.pop()
        except KeyError:
            pass

It would unfortunate if clear() could not guarantee a post-condition that the 
container is empty:

     s = {float('NaN')}
     s.clear()
     assert not s           # Can this be allowed to fail?

The case of count() is less clear-cut, but even there identity-implies-equality 
improves our ability to reason about code:  Given some list, *s*, possibly 
already populated, would you want the following code to always work:

     c = s.count(x)
     s.append(x)
     assert s.count(x) == c + 1         # To me, this is fundamental to what 
the word "count" means.

I can't find it now, but remember a possibly related discussion where we 
collectively rejected a proposal for an __is__() method.  IIRC, the reasoning 
was that our ability to think about code correctly depended on this being true:

    a = b
    assert a is b

Back to the discussion at hand, I had thought our position was roughly:

* __eq__ can return anything it wants.

* Containers are allowed but not required to assume that 
identity-implies-equality.

* Python's core containers make that assumption so that we can keep
  the containers internally consistent and so that we can reason about
  the results of operations.

Also, I believe that even very early dict code (at least as far back as Py 
1.5.2) had logic for "v is value or v == value".

As far as NaNs go, the only question is how far to propagate their notion of 
irreflexivity. Should "x == x" return False for them? We've decided yes.  When 
it comes to containers, who makes the rules, the containers or their elements.  
Mostly, we let the elements rule, but containers are allowed to make useful 
assumptions about the elements when necessary.  This isn't much different than 
the rules for the "==" operator where __eq__() can return whatever it wants, 
but functions are still allowed to write "if x == y: ..." and assumes that 
meaningful boolean value has been returned (even if it wasn't).  Likewise, the 
rule for "<" is that it can return whatever it wants, but sorted() and min() 
are allowed to assume a meaningful total ordering (which might or might not be 
true).  In other words, containers and functions are allowed, when necessary or 
useful, to override the decisions made by their data.   This seems like a 
reasonable state of affairs.

The current docs make an effort to describe what we have now: 
https://docs.python.org/3/reference/expressions.html#value-comparisons 

Sorry for the lack of concision.  I'm posting on borrowed time,


Raymond





  
_______________________________________________
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/UIZPD7OJRVID4EMO5WI7FUX6BR7XLR5D/
Code of Conduct: http://python.org/psf/codeofconduct/

[Python-Dev] Re: Are PyObject_RichCompareBool shortcuts part of Python or just CPython quirks?

Reply via email to