New submission from Andrew Bennetts <s...@users.sourceforge.net>:

set.difference(s), when s is also a set, basically does::

    res = set()
    for elem in self:
        if elem not in other:
            res.add(elem)

This is wasteful when len(self) is much greater than len(other):

$ python -m timeit -s "s = set(range(100000)); sd = s.difference; empty = 
set()" "sd(empty)"
100 loops, best of 3: 12.8 msec per loop
$ python -m timeit -s "s = set(range(10)); sd = s.difference; empty = set()" 
"sd(empty)"
1000000 loops, best of 3: 1.18 usec per loop

Here's a patch that compares the lengths of self and other before that loop, 
and if len(self) is greater, swaps them.  The new timeit results are:

$ python -m timeit -s "s = set(range(100000)); sd = s.difference; empty = 
set()" "sd(empty)"
1000000 loops, best of 3: 0.289 usec per loop
$ python -m timeit -s "s = set(range(10)); sd = s.difference; empty = set()" 
"sd(empty)"
1000000 loops, best of 3: 0.294 usec per loop

----------
components: Interpreter Core
files: set-difference-speedup.diff
keywords: patch
messages: 105489
nosy: spiv
priority: normal
severity: normal
status: open
title: set(range(100000)).difference(set()) is slow
type: performance
versions: Python 2.7
Added file: http://bugs.python.org/file17292/set-difference-speedup.diff

_______________________________________
Python tracker <rep...@bugs.python.org>
<http://bugs.python.org/issue8685>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

Reply via email to