s1 = set(['a','b','c']) s2 = set(['a','c']) s3 = set(['a','d','e','f']) s4 = set(['r','k','l']) s5 = set(['r','k','l']) ls = [s1,s2,s3,s4,s5] result1 = [s1, s3, s5]
A result can contain s4 or s5 at random. This problem looks like the one of searching the correct hyperedges for a hypergraph. s1-5 are the hyperedges, and "a", "b",... are the nodes. Probably this problem is already solved somewhere. . # You can start removing the duplicated sets (slow operation): . ls2 = list(set(map(frozenset, ls))) . # Then I've found a solution similar to the first Raymond Hettinger one: . result = [] . for si in ls2: . for sj in result: . if si <= sj: . break . else: . result.append(si) . print result This can be slow, but it probably can be made a bit faster version, but it's not easy. You can put the nodes in a undirected graph that represents the hypergraph, with other "extra nodes" that represent the hyperedges already in result). . print list(enumerate(map(list,ls2))) . # Output: [(0,['k','r','l']),(1,['a','c','b']),(2,['a','e','d','f']),(3,['a','c'])] . # Probably you can work on ls too, avoiding the slow creation of ls2, but you have to cheek . # more things later. . import graph . g = graph.Graph() . for n1,s in enumerate(ls2): . for n2 in s: . g.addBiArc(n1, n2) # add edges and nodes as needed . . # then you can find the connected components of the graph . print g # Output: 0-k,l,r 1-a,b,c 2-a,d,e,f 3-a,c . cc = g.connectedComponents() . # now cc = [[0, 'k', 'r', 'l'], [1, 'a', 'c', 'b', 2, 3, 'e', 'd', 'f']] . # and you can filter the hyperedges. This can be improved a lot: . ccf = [ [n for n in c if type(n)==int] for c in cc ] . # now ccf = [[0], [1, 2, 3]] Eppstein unionFind data structure can also be used for a similar purpose. 0-th element of ls2 is set(['r','k','l']). It's totally disjoint from the others. Now you don't have to look every sj in result, but only the sj of result that are inside its connected ones. For example, you don't have to test set(['a','c']) against set(['r','k','l']). I don't know how much faster this can be compared to the simple O(len(ls)^2) program seen before... But it can be useful for situations with lots of sets. Bye, Bearophile -- http://mail.python.org/mailman/listinfo/python-list