[Python-Dev] Re: Are PyObject_RichCompareBool shortcuts part of Python or just CPython quirks?

Tim Peters Mon, 03 Feb 2020 14:11:34 -0800

[Tim]
>> PyObject_RichCompareBool(x, y, op) has a (valuable!) shortcut: if x
>> and y are the same object, then equality comparison returns True
>> and inequality False. No attempt is made to execute __eq__ or
>> __ne__ methods in those cases.
>> ...
>> If it's intended that Python-the-language requires this, that needs to
>> be documented.


[Raymond]
> This has been slowly, but perhaps incompletely documented over the
> years and has become baked in the some of the collections ABCs as well.
>  For example, Sequence.__contains__() is defined as:
>
>     def __contains__(self, value):
>         for v in self:
>             if v is value or v == value:          # note the identity test
>                 return True
>         return False

But it's unclear to me whether that's intended to constrain all
implementations, or is just mimicking CPython's list.__contains__.
That's always a problem with operational definitions.  For example,
does it also constrain all implementations to check in iteration
order?  The order can be visible, e.g, in the number of times v.__eq__
is called.


> Various collections need to assume reflexivity, not just for speed, but so 
> that we
> can reason about them and so that they can maintain internal consistency. For
> example, MutableSet defines pop() as:
>
>     def pop(self):
>         """Return the popped value.  Raise KeyError if empty."""
>         it = iter(self)
>         try:
>             value = next(it)
>         except StopIteration:
>             raise KeyError from None
>         self.discard(value)
>         return value

As above, except  CPyhon's own set implementation implementation
doesn't faithfully conform to that:

>>> x = set(range(0, 10, 2))
>>> next(iter(x))
0
>>> x.pop() # returns first in iteration order
0
>>> x.add(1)
>>> next(iter(x))
1
>>> x.pop()  # ditto
1
>>> x.add(1)  # but try it again!
>>> next(iter(x))
1
>>> x.pop() # oops! didn't pop the first in iteration order
2

Not that I care ;-)  Just emphasizing that it's tricky to say no more
(or less) than what's intended.

> That pop() logic implicitly assumes an invariant between membership and 
> iteration:
>
>        assert(x in collection for x in collection)

Missing an "all".

> We really don't want to pop() a value *x* and then find that *x* is still
> in the container.   This would happen if iter() found the *x*, but discard()
> couldn't find the object because the object can't or won't recognize itself:

Speaking of which, why is "discard()" called instead of "remove()"?
It's sending a mixed message:  discard() is appropriate when you're
_not_ sure the object being removed is present.


>      s = {float('NaN')}
>      s.pop()
>      assert not s                  # Do we want the language to guarantee that
>                                           # s is now empty?  I think we must.

I can't imagine an actual container implementation that wouldn't. but
no actual container implements pop() in the odd way MutableSet.pop()
is written.  CPython's set.pop does nothing of the sort - doesn't even
have a pointer equality test (except against C's NULL and `dummy`,
used merely to find "the first (starting at the search finger)" slot
actually in use).

In a world where we decided that the identity shortcut is _not_
guaranteed by the language, the real consequence would be that the
MutableSet.pop() implementation would need to be changed (or made
NotImplemented, or documented as being specific to CPython).

> The code for clear() depends on pop() working:
>
>     def clear(self):
>         """This is slow (creates N new iterators!) but effective."""
>         try:
>             while True:
>                 self.pop()
>         except KeyError:
>             pass
>
> It would unfortunate if clear() could not guarantee a post-condition that the
> container is empty:

That's again a consequence of how MutableSet.pop was written.  No
actual container has any problem implementing clear() without needing
any kind of object comparison.

>      s = {float('NaN')}
>      s.clear()
>      assert not s           # Can this be allowed to fail?

No, but as above it's a very far stretch to say that clear() emptying
a container _relies_ on the object identity shortcut.  That's a just a
consequence of an odd specific clear() implementation, relying in turn
on an odd specific pop() implementation that assumes the shortcut is
in place.


> The case of count() is less clear-cut, but even there 
> identity-implies-equality
> improves our ability to reason about code:

Absolutely!  That "x is x implies equality" is very useful.  But
that's not the question ;-)

>  Given some list, *s*, possibly already populated, would you want the
> following code to always work:
>
>      c = s.count(x)
>      s.append(x)
>      assert s.count(x) == c + 1         # To me, this is fundamental
>                                                           to what the word 
> "count" means.

I would, yes.  But it's also possible to define s.count(x) as

    sum(x == y for y in s)

and live with the consequences of __eq__.

> ...

> Back to the discussion at hand, I had thought our position was roughly:
>
> * __eq__ can return anything it wants.
>
> * Containers are allowed but not required to assume that 
> identity-implies-equality.
>
> * Python's core containers make that assumption so that we can keep
>   the containers internally consistent and so that we can reason about
>   the results of operations.

All reasonable!  Python just needs something now like a benevolent dictator ;-)

> Also, I believe that even very early dict code (at least as far back
> as Py 1.5.2) had logic for "v is value or v == value".

Memory fades, but it seems to me that very early Pythons may even have
exploited the shortcut for `==` too.

> ...
> The current docs make an effort to describe what we have now: 
> https://docs.python.org/3/reference/expressions.html#value-comparisons

Yes, that's been pointed out, and it's at worst "a good start".  The
people on the original PR that kicked this off weren't aware of that
it existed.  Terry Reedy said he's thinking about how to (at least)
make it more discoverable, although at that time Guido appeared to be
leaning "implementation defined" instead.

[in another msg]
>  forget to mention that list.index() also uses PyObject_RichCompareBool()

A quick scan found about 100 calls to PyObject_RichCompareBool passing
Py_EQ.  So it screams for a way to spell out what's required that
doesn't degenerate into an exhaustive list of specific
functions/methods/contexts.
_______________________________________________
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/44XXRXK2MVDY7GKWTURZK7XFCHIR6JRX/
Code of Conduct: http://python.org/psf/codeofconduct/

[Python-Dev] Re: Are PyObject_RichCompareBool shortcuts part of Python or just CPython quirks?

Reply via email to