This has everything to do with the garbage collector and the overhead for allocating more memory to the python process. I ran some more tests, including running the 'dict w lists' code multiple times:
python tester.py runtimes with memory empty dict w/o lists: 0.0260519981384 dict w lists: 0.0418920516968 dict w lists: 0.0497941970825 loading data runtimes with memory occupied dict w/o lists: 0.0242760181427 dict w lists: 1.14558005333 dict w lists: 0.52930688858 It would appear that the first algorithm to create a bunch of lists with occupied memory incurs some extra overhead, probably due to memory allocation by the OS to the python process. Further runs don't have this problem as the previous run can use the GC'd memory from the prior run. Interestingly, pypy doesn't appear to exhibit this behavior to the same degree, probably because its GC algorithm differs from cpython's: pypy tester.py runtimes with memory empty dict w/o lists: 0.217184782028 dict w lists: 0.0508198738098 dict w lists: 0.0368840694427 loading data runtimes with memory occupied dict w/o lists: 0.0227448940277 dict w lists: 0.0329740047455 dict w lists: 0.0272088050842 In investigating this, I found yet another strange behavior. Allocate a bunch of memory, create a huge number of lists, delete them, and the next algorithm run involving lists is somehow even faster than the constants. Code is here: http://pastebin.com/GdMNkM2f And the results: runtimes with memory empty dict w/o lists: 0.0253469944 dict w lists: 0.0402500629425 dict w lists: 0.0478730201721 dict w lists: 0.0460600852966 loading data freeing some memory runtimes with memory occupied dict w/o lists: 0.0804419517517 dict w lists: 0.0254349708557 <---- What!? dict w lists: 0.523415803909 dict w lists: 0.418542861938 On Fri, Aug 5, 2011 at 10:42 AM, Ryan Roser <[email protected]> wrote: > Hi, > > I'm trying to improve the performance of some of my code. I've noticed that > one of the bottlenecks involves making a large dictionary where the values > are lists. Making a large dictionary is fast, repeatedly creating lists is > fast, but things slow down if I set the lists as values for the dictionary. > Interestingly, this slowdown only occurs if there is already data in memory > in Python, and things get increasingly slow as the amount of memory used > increases. > > I have a toy example demonstrating the behavior below. Do you know why this > is happening? Is there a problem with my test? Does Python do something > special when storing lists as values in dictionaries? Is there a workaround > or an alternative data structure that doesn't exhibit slowdown as Python's > memory usage increases? > > Thanks for the help, > > Ryan > > > > #################################### > ## A test script > #################################### > import time > import random > > x = range(100000) > def test(): > # Creating a dictionary with an entry for each element in x > # is fast, and so is repeatedly creating a list > start = time.time() > d = dict() > for i in x: > tmp = [] > tmp.append('something') > d[i] = 1 > print 'dict w/o lists:', time.time() - start > > # but assigning the list to the dictionary gets very slow > # if memory is not empty > start = time.time() > d = dict() > for i in x: > tmp = [] > tmp.append('something') > d[i] = tmp > print 'dict w lists: ', time.time() - start > > print 'runtimes with memory empty' > test() > print 'loading data' > data = [random.random() for i in xrange(30000000)] # ~1gb of mem > print 'runtimes with memory occupied' > test() > #################################### > > > Results: > > $ python2.4 tester.py > runtimes with memory empty > dict w/o lists: 0.0506901741028 > dict w lists: 0.0766770839691 > loading data > runtimes with memory occupied > dict w/o lists: 0.0391671657562 > dict w lists: 2.18966984749 > > $ python2.6 tester.py > runtimes with memory empty > dict w/o lists: 0.0479600429535 > dict w lists: 0.0784649848938 > loading data > runtimes with memory occupied > dict w/o lists: 0.0361380577087 > dict w lists: 2.49754095078 > > $ python2.7 tester.py > runtimes with memory empty > dict w/o lists: 0.0464890003204 > dict w lists: 0.0735650062561 > loading data > runtimes with memory occupied > dict w/o lists: 0.0356121063232 > dict w lists: 2.49307012558 > > > ######## Python versions and machine info ######### > Machine has 32 gb of ram, 8 cores > > Python 2.4.3 (#1, Sep 3 2009, 15:37:37) > [GCC 4.1.2 20080704 (Red Hat 4.1.2-46)] on linux2 > > ActivePython 2.6.5.14 (ActiveState Software Inc.) based on > Python 2.6.5 (r265:79063, Jul 5 2010, 10:31:13) > [GCC 4.0.0 20050519 (Red Hat 4.0.0-8)] on linux2 > > Python 2.7.1 (r271:86832, May 25 2011, 13:34:05) > [GCC 4.1.2 20080704 (Red Hat 4.1.2-48)] on linux2 > > $ uname -a > Linux research-team10 2.6.18-164.el5 #1 SMP Thu Sep 3 03:28:30 EDT 2009 > x86_64 x86_64 x86_64 GNU/Linux > -------------- next part -------------- > An HTML attachment was scrubbed... > URL: > <http://mail.python.org/pipermail/portland/attachments/20110805/95a97ef2/attachment.html> > _______________________________________________ > Portland mailing list > [email protected] > http://mail.python.org/mailman/listinfo/portland > _______________________________________________ Portland mailing list [email protected] http://mail.python.org/mailman/listinfo/portland
