Bruno Cauet added the comment:
Here is an updated patch based on Dustin's work with Josh's comments. I also
added a test which takes forever on an unpatched python interpreter.
Since it's a performance issue, I've benchmarked the results. They don't change
for the most part (argument is a set or a dict) but they're way better for
iterables.
For every type of argument I test 1 case where "set.issubset" returns True and
1 case where it returns False.
(a) simple argument (results unchanged)
$ ./python -m timeit -s "s1 = set(range(1000)); s2 = set(range(1000))"
"s1.issubset(s2)"
Unpatched: 10000 loops, best of 3: 63.7 usec per loop
Patched: 10000 loops, best of 3: 63.5 usec per loop
$ ./python -m timeit -s "s1 = set(range(1000)); s2 = set(range(1, 1000))"
"s1.issubset(s2)"
Unpatched: 1000000 loops, best of 3: 0.248 usec per loop
Patched: 1000000 loops, best of 3: 0.25 usec per loop
$ ./python -m timeit -s "s1 = set(range(1000)); s2 =
dict(enumerate(range(1000)))" "s1.issubset(s2)"
Unpatched: 10000 loops, best of 3: 107 usec per loop
Patched: 10000 loops, best of 3: 108 usec per loop
$ ./python -m timeit -s "s1 = set(range(1000)); s2 = dict(enumerate(range(1,
1000)))" "s1.issubset(s2)"
Unpatched: 10000 loops, best of 3: 43.5 usec per loop
Patched: 10000 loops, best of 3: 42.6 usec per loop
(b) iterable argument (speed improvement)
1) no improvements/slight degradation when everything must be consumed
$ ./python -m timeit -s "s1 = set(range(1000))" "s1.issubset(range(1000))"
Unpatched: 1000 loops, best of 3: 263 usec per loop
Patched: 1000 loops, best of 3: 263 usec per loop
$ ./python -m timeit -s "s1 = set(range(1000))" "s1.issubset(range(1, 1000))"
Unpatched: 10000 loops, best of 3: 201 usec per loop
Patched: 1000 loops, best of 3: 259 usec per loop
$ ./python -m timeit -s "s1 = set(range(100))" "s1.issubset(range(1, 1000))"
Unpatched: 1000 loops, best of 3: 198 usec per loop
Patched: 1000 loops, best of 3: 218 usec per loop
2) tremendous improvements when it can return early
$ ./python -m timeit -s "s1 = set(range(100))" "s1.issubset(range(1000))"
Unpatched: 1000 loops, best of 3: 209 usec per loop
Patched: 100000 loops, best of 3: 12.1 usec per loop
$ ./python -m timeit -s "s1 = set('a'); s2 = ['a'] + ['b'] * 10000"
"s1.issubset(s2)"
Unpatched: 1000 loops, best of 3: 368 usec per loop
Patched: 1000000 loops, best of 3: 0.934 usec per loop
$ ./python -m timeit -s "s1 = set('a'); from itertools import repeat"
"s1.issubset(repeat('a'))"
Unpatched: NEVER FINISHES
Patched: 1000000 loops, best of 3: 1.33 usec per loop
----------
nosy: +bru, haypo
Added file:
http://bugs.python.org/file37295/0001-Improve-set.issubset-performance-on-iterables.patch
_______________________________________
Python tracker <[email protected]>
<http://bugs.python.org/issue18032>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe:
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com