Josh Rosenberg <[email protected]> added the comment:
To be clear, set().__or__(x) returns NotImplemented, it doesn't raise
NotImplementedError. I've edited the title to match.
One major problem that gets in the way of a fix is that the interface of set
and dict views doesn't match, because dict views only provide the interface of
collections.abc.Set, and collections.abc.Set *only* requires the operator
overloads, not the named equivalent methods. And because the named equivalents
don't exist, the operator overloads were made more liberal, to accept any
iterable, not just set or set-like things.
set's rules are:
1. Named methods take varargs, and accept any iterable(s)
2. Operator overloads take only one argument, which must be a set/frozenset
dict view's rules are:
1. There are no named methods (aside from isdisjoint, which has no operator
equivalent)
2. Operator overloads take only one argument, which can be any iterable
So fixing this is problematic so long as we want to keep a simple
implementation; right now, all the methods simplify to:
1. Convert to set
2. Call named, in-place method on resulting set to get liberal behavior
Fixing this would involve either:
1. Rewriting all the dict views operator overloads to do the work themselves,
including the logic to reject non-iterables by returning NotImplemented like
set's operators do.
Pro: Not a hack. Resulting code would be more efficient in many cases (e.g.
dict view's __and__ could make a new empty set [currently makes set with a copy
of the view's values], and iterate the right hand side while performing
membership tests in the underlying dict before inserting into the result set,
avoiding the need to make a large set that will almost certainly shrink,
possibly requiring rehashes)
Con: Lots of new code to maintain. Would not benefit from any algorithmic
improvements to equivalent set code without mirroring them manually.
2. Checking for TypeError from the set method and explicitly returning
NotImplemented.
Pro: Easy, code remains maintainable, benefits from any future
optimizations to set
Cons: Can't distinguish fundamental problems that should be TypeErrors
(right hand side contains unhashable objects) from simple case that should
return NotImplemented (right hand side isn't iterable). Potentially fixable by
explicitly performing the conversion to iterator on the dict view's side
(though you'd only do it to confirm it's possible; you'd want to pass the
original object unmodified for the common case where it's a set/frozenset and
could be processed more efficiently). Also makes failure case even slower than
it already is (exception is raised, then ignored, then NotImplemented is
returned, then right-hand side is checked, and odds are, the right-hand side
won't know what to do either, so it'll return NotImplemented anyway, and a
TypeError gets thrown after more work).
3. Refactor the set API to expose (private, C level) helpers for the operator
overloads (specifically, the *in-place* versions like __ior__, since we're
always making a new true set to call it on anyway, and it seems silly to use
__or__ which would make yet another set, and force us to throw away the first
set we made) that are less strict about their arguments.
Pros: Roughly the same as #2, plus a minor per-call (not per item)
performance boost: C helpers could be called directly, avoiding overhead of
calling Python level methods via _PyObject_CallMethodIdOneArg(result,
&PyId_update, other); invoking dict lookups and the like when we know we just
made a true set object, so there is no possibility of update being overridden,
so a direct C level call would save time. Continues to benefit from any
optimizations to set's code (since 95% of the code remains the same).
Con: More code churn than #2; unfortunately, unlike list (which has type
loose in-place operator overloads), set's in-place operator overloads are still
type strict; alist += {1,2,3} works, but aset |= [1,2,3] does not, so we can't
just call set's __ixor__ unmodified, we'd need a _PySet_IXorIterable separate
from the existing set_ior, where both of them might be implemented in terms of
set_update_internal, but _PySet_IXorIterable only checks that the argument is
iterable to return NotImplemented.
#1 and #3 seem to be the only fully correct options; #1 could squeeze out more
performance, but requires a *lot* more new code, #3 requires a lot less new
code, and gets a tiny performance boost.
I don't see explicit calls to __or__ being all that common though, and I have a
hard time envisioning a class that would hit this case without explicitly
calling __or__; for `{}.keys() | x` the only time you'd see a behavioral
difference from `set() | x` is when:
1. x is not iterable, and
2. x defines a __ror__ that can do something meaningful with a set
I'd personally assume anything that meets criterion #2 would be iterable;
anyone have a counter-example? If this is only a theoretical problem, might it
make sense to just leave it as is?
----------
nosy: +josh.r
title: Inconsistent behavior between set and dict_keys/dict_items: for
non-iterable object x, set().__or__(x) raises NotImplementedError, but
{}.keys().__or__(x) raises TypeError -> Inconsistent behavior between set and
dict_keys/dict_items: for non-iterable object x, set().__or__(x) returns
NotImplemented, but {}.keys().__or__(x) raises TypeError
_______________________________________
Python tracker <[email protected]>
<https://bugs.python.org/issue24413>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe:
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com