Dear Bob, your answer makes me feel very stupid and I think you're even angry at me for appearing stupid, or maybe for hinting that python was not good enough. Don't get me wrong, I love python, and will not switch to assembly any time soon.... Of course, I don't use that kind of code it was only an example to make a point that the union command was very slow when the sets get big. I guess it wasn't such a good example.
I usually combine 2 sets using union but one set is much smaller than the other. It is better to do it using "add", as suggested, and that is my solution. I didn't realize that using "add" will not create redundancy, as you said these are like keys in a dictionary so no duplicates can occur. I suppose "union" is worth doing when I have two big sets to be combined. I will be more careful when I post next time. promise. Yair On May 30, 2005, at 6:03 , Bob Ippolito wrote: > > On May 30, 2005, at 7:44 AM, Yair Benita wrote: > > >> My research involves genomic research and the use of sets (recently >> introduced in version 2.4) makes my life easier in a lot of ways. >> However, I >> noticed that working with large set slows the script to unbearable >> speed. >> > > Sets were actually introduced in Python 2.3, in the sets module, > but were introduced as a built-in in Python 2.4. > > >> Below I compare two simple scripts, one makes use of a list and >> the other >> makes use of a set. The difference of 20 seconds may not be much, >> but let me >> tell you that this difference grows exponentially. When my sets >> reach more >> that 100,000 elements, a union command is painfully slow. True, a >> union >> command of a set may be much more complicated than using a list >> but the time >> difference is simply too big for me. >> >> Any thoughts/suggestions? >> > > Yeah, don't write code like that. If you don't want exponential > time, don't write algorithms that will take exponential time :) > Use the right algorithm. > > Your two examples aren't equivalent. It has exponential time > because union takes two sets to return a third. In your example, > you are uselessly creating two new sets (and a single-element list) > on every iteration: one set with one item, and one set with N > items. What you really want to be using is the add or update > method, which mutates the set in-place. The list example is > nowhere near equivalent. To compare apples to apples the list > example should be: > > if i not in x: > x = x + [i] > > and that would be much slower than either of your examples! > > On my 1ghz G4 laptop: > > Your set example: 87.921s > Your list example: 51.542s > My set example (using x.add(i)): 0.095s > > >> I suppose that the obvious solution is to work with lists most of >> the time >> and use the sets only when I really need them. But then, what's >> the point of >> having those set? >> > > What's the point of having Python if you can write faster code in > assembly? > > Where did you learn to use lists as a set anyway? The canonical > pre-2.3 example was to use a dict as a set (since the keys are a set). > > -bob > > _______________________________________________ Pythonmac-SIG maillist - Pythonmac-SIG@python.org http://mail.python.org/mailman/listinfo/pythonmac-sig