Hi All, My research involves genomic research and the use of sets (recently introduced in version 2.4) makes my life easier in a lot of ways. However, I noticed that working with large set slows the script to unbearable speed.
Below I compare two simple scripts, one makes use of a list and the other makes use of a set. The difference of 20 seconds may not be much, but let me tell you that this difference grows exponentially. When my sets reach more that 100,000 elements, a union command is painfully slow. True, a union command of a set may be much more complicated than using a list but the time difference is simply too big for me. Any thoughts/suggestions? I suppose that the obvious solution is to work with lists most of the time and use the sets only when I really need them. But then, what's the point of having those set? Consider the following example just for making my point: (sets were run using MacPython 2.4.1 on a PowerMac G4 dual 1.25) ############################ from time import time x = set([]) prev_time = time() initial_time = time() for i in range(20000): x = x.union(set([i])) if i % 1000 == 0: print i, "done. seconds since last: %.3f" % (time()-prev_time) prev_time = time() print "done in %.3f" %(time() - initial_time) ############################ The output of this script is: 0 done. seconds since last: 0.002 1000 done. seconds since last: 0.103 2000 done. seconds since last: 0.354 3000 done. seconds since last: 0.784 4000 done. seconds since last: 0.861 5000 done. seconds since last: 1.501 6000 done. seconds since last: 1.707 7000 done. seconds since last: 1.751 8000 done. seconds since last: 1.894 9000 done. seconds since last: 2.907 10000 done. seconds since last: 3.155 11000 done. seconds since last: 3.278 12000 done. seconds since last: 3.374 13000 done. seconds since last: 3.559 14000 done. seconds since last: 3.824 15000 done. seconds since last: 3.905 16000 done. seconds since last: 4.040 17000 done. seconds since last: 5.853 18000 done. seconds since last: 6.928 19000 done. seconds since last: 7.038 done in 64.414 Lets run the same script using a list: ############################ from time import time x = [] prev_time = time() initial_time = time() for i in range(20000): if i not in x: x.append(i) if i % 1000 == 0: print i, "done. seconds since last: %.3f" % (time()-prev_time) prev_time = time() print "done in %.3f" %(time() - initial_time) ############################ The output of this script is: 0 done. seconds since last: 0.002 1000 done. seconds since last: 0.102 2000 done. seconds since last: 0.323 3000 done. seconds since last: 0.506 4000 done. seconds since last: 0.676 5000 done. seconds since last: 0.871 6000 done. seconds since last: 1.051 7000 done. seconds since last: 1.242 8000 done. seconds since last: 1.365 9000 done. seconds since last: 1.635 10000 done. seconds since last: 1.760 11000 done. seconds since last: 1.929 12000 done. seconds since last: 2.133 13000 done. seconds since last: 2.294 14000 done. seconds since last: 2.659 15000 done. seconds since last: 3.015 16000 done. seconds since last: 3.182 17000 done. seconds since last: 3.332 18000 done. seconds since last: 3.552 19000 done. seconds since last: 3.690 done in 39.197 _______________________________________________ Pythonmac-SIG maillist - Pythonmac-SIG@python.org http://mail.python.org/mailman/listinfo/pythonmac-sig