Istvan Albert wrote: > I've been debugging the reason for a major slowdown in a piece of > code ... and it turns out that it was the zip function. In the past > the lists that were zipped were reasonably short, but once the size > exceeded 10 million the zip function slowed to a crawl. Note that > there was memory available to store over 100 million items. > > Now I know that zip () wastes lots of memory because it copies the > content of the lists, I had used zip to try to trade memory for speed > (heh!) , and now that everything was replaced with izip it works just > fine. What was really surprising is that it works with no issues up > until 1 million items, but for say 10 million it pretty much goes > nuts. Does anyone know why? is there some limit that it reaches, or is > there something about the operating system (Vista in the case) that > makes it behave like so? > > I've noticed the same kinds of behavior when trying to create very > long lists that should easily fit into memory, yet above a given > threshold I get inexplicable slowdowns. Now that I think about is this > something about the way lists grow when expanding them? > > and here is the code: > > from itertools import izip > > BIGNUM = int(1E7) > > # let's make a large list > data = range(BIGNUM) > > # this works fine (uses about 200 MB and 4 seconds) > s = 0 > for x in data: > s += x > print s > > > # this works fine, 4 seconds as well > s = 0 > for x1, x2 in izip(data, data): > s += x1 > print s > > > # this takes over 2 minutes! and uses 600 MB of memory > # the memory usage slowly ticks upwards > s = 0 > for x1, x2 in zip(data, data): > s += x1 > print s
When you are allocating a lot of objects without releasing them the garbage collector kicks in to look for cycles. Try switching it off: import gc gc.disable() try: # do the zipping finally: gc.enable() Peter -- http://mail.python.org/mailman/listinfo/python-list