On May 30, 2005, at 7:44 AM, Yair Benita wrote: > My research involves genomic research and the use of sets (recently > introduced in version 2.4) makes my life easier in a lot of ways. > However, I > noticed that working with large set slows the script to unbearable > speed.
Sets were actually introduced in Python 2.3, in the sets module, but were introduced as a built-in in Python 2.4. > Below I compare two simple scripts, one makes use of a list and the > other > makes use of a set. The difference of 20 seconds may not be much, > but let me > tell you that this difference grows exponentially. When my sets > reach more > that 100,000 elements, a union command is painfully slow. True, a > union > command of a set may be much more complicated than using a list but > the time > difference is simply too big for me. > > Any thoughts/suggestions? Yeah, don't write code like that. If you don't want exponential time, don't write algorithms that will take exponential time :) Use the right algorithm. Your two examples aren't equivalent. It has exponential time because union takes two sets to return a third. In your example, you are uselessly creating two new sets (and a single-element list) on every iteration: one set with one item, and one set with N items. What you really want to be using is the add or update method, which mutates the set in-place. The list example is nowhere near equivalent. To compare apples to apples the list example should be: if i not in x: x = x + [i] and that would be much slower than either of your examples! On my 1ghz G4 laptop: Your set example: 87.921s Your list example: 51.542s My set example (using x.add(i)): 0.095s > I suppose that the obvious solution is to work with lists most of > the time > and use the sets only when I really need them. But then, what's the > point of > having those set? What's the point of having Python if you can write faster code in assembly? Where did you learn to use lists as a set anyway? The canonical pre-2.3 example was to use a dict as a set (since the keys are a set). -bob _______________________________________________ Pythonmac-SIG maillist - Pythonmac-SIG@python.org http://mail.python.org/mailman/listinfo/pythonmac-sig