On Thu, 2020-01-23 at 18:36 -0800, Guido van Rossum wrote:
> Good question!
> 

It is, below mostly lamenting, so just to say my personal gut feeling
would be that it should probably be considered an "implementation
detail" that this used e.g. by most containers. But besides that it
leads to unexpected behaviour sometimes, I am not sure I have any
actual reasons. (Unless some typing JIT could run into it?)

> I think this started with a valuable optimization for `x in <list>`.
> I don't know if that was ever carefully documented, but I remember
> that it was discussed a few times (and IIRC Raymond was adamant that
> this should be so optimized -- which is reasonable).
> 
> I'm tempted to declare this implementation-defined behavior --
> *implicit* calls to __eq__ and __ne__ *may* be skipped if both sides
> are the same object depending on the whim of the implementation.
> 
> We should probably also strongly recommend that __eq__ and __ne__ not
> do what math.nan does.


Another object similar to this are masked values (which e.g. pandas is
looking at). [2]
In their current definition, masked values would have to behave in a
similar way as numpy arrays, since `bool(NA == NA) -> bool(boolean_NA)`
is an error rather than `True`.
These objects are rare but hard to avoid completely, I guess...

> 
> However we cannot stop rich compare __eq__ implementations that
> return arrays of pairwise comparisons, since numpy does this. (And
> yes, it seems that this means that `x in y` is computed incorrectly
> if x is an array with such an __eq__ implementation and y is a tuple
> of such objects. I'm sure there's a big warning somewhere in the
> numpy docs about this, and I presume if y is a numpy array they make
> sure to do something better.)
> 

I somewhat doubt there is a big a warning currently...

In NumPy we actually stopped using `PyObject_RichCompareBool` within
`np.equal` a pretty long time ago [1]. IIRC we perceived it as a
bug.
However, as you said, object arrays which would succeed randomly [2]
were probably the more important motivation (rather than NaN). 

I do not think anyone has ever evaluated the performance impact of that
change though...

- Sebastian


[0] 
https://github.com/pandas-dev/pandas/pull/29597/files#diff-239ec95d581257ed256954660663b277R825-R827

[1] 
https://numpy.org/devdocs/release/1.13.0-notes.html#futurewarning-to-changed-behavior

[2] For those not familiar with NumPy, in NumPy: `[1, 2] == [1, 2]`
returns `[True, True]` but may here return True (if they are the same
object). The comparison should raise an error because [True, True] does
not generally have a truthiness defined but will succeed randomly.




> On Thu, Jan 23, 2020 at 5:33 PM Tim Peters <tim.pet...@gmail.com>
> wrote:
> > PyObject_RichCompareBool(x, y, op) has a (valuable!) shortcut:  if
> > x
> > and y are the same object, then equality comparison returns True
> > and
> > inequality False.  No attempt is made to execute __eq__ or __ne__
> > methods in those cases.
> > 
> > This has visible consequences all over the place, but they don't
> > appear to be documented.  For example,
> > 
> > >>> import math
> > >>> ([math.nan] * 5).count(math.nan)
> > 5
> > 
> > despite that `math.nan == math.nan` is False.
> > 
> > It's usually clear which methods will be called, and when, but not
> > really here.  Any _context_ that calls PyObject_RichCompareBool()
> > under the covers, for an equality or inequality test, may or may
> > not
> > invoke __eq__ or __ne__, depending on whether the comparands are
> > the
> > same object.  Also any context that inlines these special cases to
> > avoid the overhead of calling PyObject_RichCompareBool() at all.
> > 
> > If it's intended that Python-the-language requires this, that needs
> > to
> > be documented.
> > 
> > Or if it's implementation-defined, then _that_ needs to be
> > documented.
> > 
> > Which isn't straightforward in either case, in part because
> > PyObject_RichCompareBool isn't a language-level concept.
> > 
> > This came up recently when someone _noticed_ the list.count(NaN)
> > behavior, and Victor made a PR to document it:
> > 
> > https://github.com/python/cpython/pull/18130
> > 
> > I'm pushing back, because documenting it _only_ for .count() makes
> > .count() seem unique in a way it isn't, and doesn't resolve the
> > fundamental issue:  is this language behavior, or implementation
> > behavior?
> > 
> > Which I don't want to argue about.  But you certainly should ;-)
> > _______________________________________________
> > Python-Dev mailing list -- python-dev@python.org
> > To unsubscribe send an email to python-dev-le...@python.org
> > https://mail.python.org/mailman3/lists/python-dev.python.org/
> > Message archived at 
> > https://mail.python.org/archives/list/python-dev@python.org/message/3ZAMS473HGHSI64XB3UV4XBICTG2DKVF/
> > Code of Conduct: http://python.org/psf/codeofconduct/
> 
> 
> _______________________________________________
> Python-Dev mailing list -- python-dev@python.org
> To unsubscribe send an email to python-dev-le...@python.org
> https://mail.python.org/mailman3/lists/python-dev.python.org/
> Message archived at 
> https://mail.python.org/archives/list/python-dev@python.org/message/X4ZIICG2EBMYPFUASI5TW4E6PIT2KR6M/
> Code of Conduct: http://python.org/psf/codeofconduct/

Attachment: signature.asc
Description: This is a digitally signed message part

_______________________________________________
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/ISAL3SWZTBP3QW7WJ5BXOYDC5QFOY63Z/
Code of Conduct: http://python.org/psf/codeofconduct/

Reply via email to