On Mon, Dec 22, 2008 at 2:22 PM, Adam Olsen <rha...@gmail.com> wrote: > To make sure that's the correct line please recompile python without > optimizations. GCC happily reorders and merges different parts of a > function. > > Adding a counter in C and recompiling would be a lot faster than using > a gdb hook.
Okay, I did this. The results are the same, except that now sampling selects the different source statements within this loop, instead of just the top of the loop (which makes sense). I added a counter (static volatile long) as suggested, and a breakpoint to sample it. Not every pass through PyObject_Free takes case 3, but for those that do, this loop runs around 100-25000 times. I didn't try to graph it, but based on a quick sample, it looks like more than 5000 iterations on most occasions. The total counter is 12.4 billion at the moment, and still growing. That seems high, but I'm not sure what would be expected or hoped for. I have a script that demonstrates the problem, but unfortunately the behavior isn't clearly bad until large amounts of memory are used. I don't think it shows at 2G, for example. (A 32G machine is sufficient.) Here is a log of running the program at different sizes ($1): 1 4.04686999321 0.696660041809 2 8.1575551033 1.46393489838 3 12.6426320076 2.30558800697 4 16.471298933 3.80377006531 5 20.1461620331 4.96685886383 6 25.150053978 5.48230814934 7 28.9099609852 7.41244196892 8 32.283219099 6.31711483002 9 36.6974511147 7.40236377716 10 40.3126089573 9.01174497604 20 81.7559120655 20.3317198753 30 123.67071104 31.4815018177 40 161.935647011 61.4484620094 50 210.610441923 88.6161060333 60 248.89805007 118.821491003 70 288.944771051 194.166989088 80 329.93295002 262.14109993 90 396.209988832 454.317914009 100 435.610564947 564.191882133 If you plot this, it is clearly quadratic (or worse). Here is the script: #!/usr/bin/env python """ Try to trigger quadratic (?) behavior during .clear() of a large but simple defaultdict. """ from collections import defaultdict import time import sys import gc; gc.disable() print >> sys.stderr, sys.version h = defaultdict(list) n = 0 lasttime = time.time() megs = int(sys.argv[1]) print megs, sys.stdout.flush() # 100M iterations -> ~24GB? on my 64-bit host for i in xrange(megs * 1024 * 1024): s = '%0.7d' % i h[s].append(('xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx', 12345)) h[s].append(('xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx', 12345)) h[s].append(('xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx', 12345)) # if (i % 1000000) == 0: # t = time.time() # print >> sys.stderr, t-lasttime # lasttime = t t = time.time() print t-lasttime, sys.stdout.flush() lasttime = t h.clear() t = time.time() print t-lasttime, sys.stdout.flush() lasttime = t print _______________________________________________ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com