Andrew Dalke <[email protected]> added the comment:
My belief is that the people who use set.intersection with more than two terms
are 1) going to pass in a list of sets, and 2) don't care about the specific
order.
To check the validity of my belief, I did a Google Code Search to find cases of
people using set intersection in Python. I searched for "set\.intersection\(\*"
and "\.intersection\(.*\, lang:^python$", among others.
I am sad to report that the most common way to compute set.intersection(*list)
is by using reduce, like:
possible = (set(index[c]) for c in set(otp))
possible = reduce(lambda a, b: a.intersection(b), possible)
That comes from:
git://github.com/Kami/python-yubico-client.git /yubico/modhex.py
and similar uses are in:
git://github.com/sburns/PyCap.git /redcap/rc.py
http://hltdi-l3.googlecode.com/hg//xdg/languages/morpho/fst.py
http://dsniff.googlecode.com/svn/trunk/dsniff/lib/fcap.py
As well as in the Rosetta Code example for a simple inverted index, at:
http://rosettacode.org/wiki/Inverted_index#Python
This was also implemented more verbosely in:
http://eats.googlecode.com/svn/trunk/server/eats/views/main.py
intersected_set = sets[0]
for i in range(1, len(sets)):
intersected_set = intersected_set.intersection(sets[i])
and
http://iocbio.googlecode.com/svn/trunk/iocbio/microscope/cluster_tools.py
s = set (range (len (data[0])))
for d in zip(*data):
s = s.intersection(set(find_outliers(d, zoffset=zoffset)))
return sorted(s)
In other words, 7 codebases use manual pairwise reduction rather than use the
equivalent code in Python. (I have not checked for which are due to backwards
compatibility requirements.)
On the other hand, if someone really wants to have a specific intersection
order, this shows that it's very easy to write.
I found 4 other code bases where set intersection was used for something other
than binary intersection, and used the built-in intersection().
git://github.com/valda/wryebash.git/experimental/bait/bait/presenter/impl/filters.py
def get_visible_node_ids(self, filterId):
if filterId in self.idMask:
visibleNodeIdSets = [f.get_visible_node_ids(filterId) for f in
self._filters]
return set.intersection(*[v for v in visibleNodeIdSets if v is not
None])
return None
http://wallproxy.googlecode.com/svn/trunk/local/proxy.py
if threads[ct].intersection(*threads.itervalues()):
raise ValueError('All threads failed')
(here, threads' values contain sets)
git://github.com/argriffing/xgcode.git/20100623a.py
header_sets = [set(x) for x in header_list]
header_intersection = set.intersection(*header_sets)
http://pyvenn.googlecode.com/hg//venn.py
to_exclude = set()
for ii in xrange(0, len(self.sets)):
if (i & (2**ii)):
sets_to_intersect.append(sets_by_power_of_two[i & (2**ii)])
else:
to_exclude = to_exclude.union(sets_by_power_of_two[(2**ii)])
final = set.intersection(*sets_to_intersect) - to_exclude
These all find the intersection of sets (not iterators), and the order of
evaluation does not appear like it will affect the result.
I do not know though if there will be a performance advantage in these cases to
reordering. I do know that in my code, and any inverted index, there is an
advantage.
And I do know that the current CPython implementation has bad worst-case
performance.
----------
_______________________________________
Python tracker <[email protected]>
<http://bugs.python.org/issue13653>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe:
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com