[issue24413] Inconsistent behavior between set and dict_keys/dict_items: for non-iterable object x, set().or(x) returns NotImplemented, but {}.keys().or(x) raises TypeError

Josh Rosenberg Wed, 07 Aug 2019 12:51:41 -0700


Josh Rosenberg <[email protected]> added the comment:


To be clear, set().__or__(x) returns NotImplemented, it doesn't raise 
NotImplementedError. I've edited the title to match.

One major problem that gets in the way of a fix is that the interface of set 
and dict views doesn't match, because dict views only provide the interface of 
collections.abc.Set, and collections.abc.Set *only* requires the operator 
overloads, not the named equivalent methods. And because the named equivalents 
don't exist, the operator overloads were made more liberal, to accept any 
iterable, not just set or set-like things.

set's rules are:

1. Named methods take varargs, and accept any iterable(s)
2. Operator overloads take only one argument, which must be a set/frozenset

dict view's rules are:

1. There are no named methods (aside from isdisjoint, which has no operator 
equivalent)
2. Operator overloads take only one argument, which can be any iterable

So fixing this is problematic so long as we want to keep a simple 
implementation; right now, all the methods simplify to:

1. Convert to set
2. Call named, in-place method on resulting set to get liberal behavior

Fixing this would involve either:

1. Rewriting all the dict views operator overloads to do the work themselves, 
including the logic to reject non-iterables by returning NotImplemented like 
set's operators do.

    Pro: Not a hack. Resulting code would be more efficient in many cases (e.g. 
dict view's __and__ could make a new empty set [currently makes set with a copy 
of the view's values], and iterate the right hand side while performing 
membership tests in the underlying dict before inserting into the result set, 
avoiding the need to make a large set that will almost certainly shrink, 
possibly requiring rehashes)

    Con: Lots of new code to maintain. Would not benefit from any algorithmic 
improvements to equivalent set code without mirroring them manually.

2. Checking for TypeError from the set method and explicitly returning 
NotImplemented.

    Pro: Easy, code remains maintainable, benefits from any future 
optimizations to set

    Cons: Can't distinguish fundamental problems that should be TypeErrors 
(right hand side contains unhashable objects) from simple case that should 
return NotImplemented (right hand side isn't iterable). Potentially fixable by 
explicitly performing the conversion to iterator on the dict view's side 
(though you'd only do it to confirm it's possible; you'd want to pass the 
original object unmodified for the common case where it's a set/frozenset and 
could be processed more efficiently). Also makes failure case even slower than 
it already is (exception is raised, then ignored, then NotImplemented is 
returned, then right-hand side is checked, and odds are, the right-hand side 
won't know what to do either, so it'll return NotImplemented anyway, and a 
TypeError gets thrown after more work).

3. Refactor the set API to expose (private, C level) helpers for the operator 
overloads (specifically, the *in-place* versions like __ior__, since we're 
always making a new true set to call it on anyway, and it seems silly to use 
__or__ which would make yet another set, and force us to throw away the first 
set we made) that are less strict about their arguments.

    Pros: Roughly the same as #2, plus a minor per-call (not per item) 
performance boost: C helpers could be called directly, avoiding overhead of 
calling Python level methods via _PyObject_CallMethodIdOneArg(result, 
&PyId_update, other); invoking dict lookups and the like when we know we just 
made a true set object, so there is no possibility of update being overridden, 
so a direct C level call would save time. Continues to benefit from any 
optimizations to set's code (since 95% of the code remains the same).

    Con: More code churn than #2; unfortunately, unlike list (which has type 
loose in-place operator overloads), set's in-place operator overloads are still 
type strict; alist += {1,2,3} works, but aset |= [1,2,3] does not, so we can't 
just call set's __ixor__ unmodified, we'd need a _PySet_IXorIterable separate 
from the existing set_ior, where both of them might be implemented in terms of 
set_update_internal, but _PySet_IXorIterable only checks that the argument is 
iterable to return NotImplemented.

#1 and #3 seem to be the only fully correct options; #1 could squeeze out more 
performance, but requires a *lot* more new code, #3 requires a lot less new 
code, and gets a tiny performance boost.

I don't see explicit calls to __or__ being all that common though, and I have a 
hard time envisioning a class that would hit this case without explicitly 
calling __or__; for `{}.keys() | x` the only time you'd see a behavioral 
difference from `set() | x` is when:

1. x is not iterable, and
2. x defines a __ror__ that can do something meaningful with a set

I'd personally assume anything that meets criterion #2 would be iterable; 
anyone have a counter-example? If this is only a theoretical problem, might it 
make sense to just leave it as is?

----------
nosy: +josh.r
title: Inconsistent behavior between set and dict_keys/dict_items: for 
non-iterable object x, set().__or__(x) raises NotImplementedError, but 
{}.keys().__or__(x) raises TypeError -> Inconsistent behavior between set and 
dict_keys/dict_items: for non-iterable object x, set().__or__(x) returns 
NotImplemented, but {}.keys().__or__(x) raises TypeError

_______________________________________
Python tracker <[email protected]>
<https://bugs.python.org/issue24413>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue24413] Inconsistent behavior between set and dict_keys/dict_items: for non-iterable object x, set().__or__(x) returns NotImplemented, but {}.keys().__or__(x) raises TypeError

Reply via email to

[issue24413] Inconsistent behavior between set and dict_keys/dict_items: for non-iterable object x, set().or(x) returns NotImplemented, but {}.keys().or(x) raises TypeError