[issue43464] set intersections should short-circuit

2022-04-06 Thread Serhiy Storchaka


Change by Serhiy Storchaka :


--
resolution:  -> fixed
stage:  -> resolved
status: open -> closed
versions: +Python 3.11 -Python 3.10

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue43464] set intersections should short-circuit

2022-04-06 Thread Serhiy Storchaka


Serhiy Storchaka  added the comment:


New changeset 31cd25f4e17cd68487dc76c1b2ec162a646818c2 by Serhiy Storchaka in 
branch 'main':
bpo-43464: Optimize set.intersection() for non-set arguments (GH-31316)
https://github.com/python/cpython/commit/31cd25f4e17cd68487dc76c1b2ec162a646818c2


--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue43464] set intersections should short-circuit

2022-02-13 Thread Serhiy Storchaka


Serhiy Storchaka  added the comment:

See also issue46721. set.issuperset() can get similar optimization.

And set.issubset() will benefit from this optimization if use 
set.intersection() in issue46705.

--
keywords:  -patch
stage: patch review -> 

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue43464] set intersections should short-circuit

2022-02-13 Thread Serhiy Storchaka


Change by Serhiy Storchaka :


--
keywords: +patch
nosy: +serhiy.storchaka
nosy_count: 2.0 -> 3.0
pull_requests: +29477
stage:  -> patch review
pull_request: https://github.com/python/cpython/pull/31316

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue43464] set intersections should short-circuit

2021-03-10 Thread Karthikeyan Singaravelan


Change by Karthikeyan Singaravelan :


--
nosy: +rhettinger

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue43464] set intersections should short-circuit

2021-03-10 Thread Josh Rosenberg


New submission from Josh Rosenberg :

At present, set_intersection (the C name for set.intersection) optimizes for 
pairs of sets by iterating the smallest set and only adding entries found in 
the larger, meaning work is proportionate to the smallest input.

But when the other input isn't a set, it goes with a naive solution, iterating 
the entire non-set, and adding entries found in the set. This is fine when the 
intersection will end up smaller than the original set (there's no way to avoid 
exhausting the non-set when that's the case), but when the intersection ends up 
being the same size as the original, we could add a cheap length check and 
short-circuit at that point.

As is, {4}.intersection(range(1)) takes close to 1000 times longer than 
{4}.intersection(range(10)) despite both of them reaching the point where the 
outcome will be {4} at the same time.

Since the length check for short-circuiting only needs to be performed when 
input set actually contains the value, the cost should be fairly low.

I figure this would be the worst case for the change:

{3, 4}.intersection((4,) * 1)

where it performs the length check every time, and doesn't benefit from 
short-circuiting. But cases like:

{4}.intersection((4,) * 1)

or

{4}.intersection(range(1))

would finish much faster. A similar optimization to set_intersection_multi (to 
stop when the intermediate result is empty) would make cases like:

{4000}.intersection([1], range(1), range(10, 20))

also finish dramatically quicker in the (I'd assume not uncommon case) where 
the intersection of many iterables is empty, and this could be known quite 
early on (the cost of this check would be even lower, since it would only be 
performed once per iterable, not per-value).

Only behavioral change this would cause is that errors resulting from 
processing items in an iterable that is no longer run to exhaustion due to 
short-circuiting wouldn't happen ({4}.intersection([4, []]) currently dies, but 
would succeed with short-circuiting; same foes for {4}.intersection([5], [[]]) 
if set_intersection_multi is optimized), and input iterators might be left only 
partially consumed. If that's acceptable, the required code changes are trivial.

--
components: C API
keywords: easy (C)
messages: 388442
nosy: josh.r
priority: normal
severity: normal
status: open
title: set intersections should short-circuit
versions: Python 3.10

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com