> Now, we should find a way to benchmark this without having to steal Mike's > machine and wait 30 minutes every time.
So, I seem to reproduce it. The following script takes about 15 seconds to run and allocates a 2 GB dict which it deletes at the end (gc disabled of course). With 2.4, deleting the dict takes ~1.2 seconds while with 2.5 and higher (including 3.0), deleting the dict takes ~3.5 seconds. Nothing spectacular but the difference is clear. Also, after the dict is deleted and before the program exits, you can witness (with `ps` or `top`) that 2.5 and higher has reclaimed 1GB, while 2.4 has reclaimed nothing. There is a sleep() call at the end so that you have the time :-) You can tune memory occupation at the beginning of the script, but the lower the more difficult it will be to witness a difference. Regards Antoine. ####### import random import time import gc import itertools # Adjust this parameter according to your system RAM! target_size = int(2.0 * 1024**3) # 2.0 GB pool_size = 4 * 1024 # This is a ballpark estimate: 60 bytes overhead for each # { dict entry struct + float object + tuple object header }, # 1.3 overallocation factor for the dict. target_length = int(target_size / (1.3 * (pool_size + 60))) def make_dict(): print ("filling dict up to %d entries..." % target_length) # 1. Initialize the dict from a set of pre-computed random keys. keys = [random.random() for i in range(target_length)] d = dict.fromkeys(keys) # 2. Build the values that will constitute the dict. Each value will, as # far as possible, span a contiguous `pool_size` memory area. # Over 256 bytes per alloc, PyObject_Malloc defers to the system malloc() # We avoid that by allocating tuples of smaller longs. int_size = 200 # 24 roughly accounts for the long object overhead (YMMV) int_start = 1 << ((int_size - 24) * 8 - 7) int_range = range(1, 1 + pool_size // int_size) values = [None] * target_length # Maximize allocation locality by pre-allocating the values for n in range(target_length): values[n] = tuple(int_start + j for j in int_range) if n % 10000 == 0: print (" %d iterations" % n) # The keys are iterated over in their original order rather than in # dict order, so as to randomly spread the values in the internal dict # table wrt. allocation address. for n, k in enumerate(keys): d[k] = values[n] print ("dict filled!") return d if __name__ == "__main__": gc.disable() t1 = time.time() d = make_dict() t2 = time.time() print (" -> %.3f s." % (t2 - t1)) print ("deleting dict...") t2 = time.time() del d t3 = time.time() print (" -> %.3f s." % (t3 - t2)) print ("Finished, you can press Ctrl+C.") time.sleep(10.0) _______________________________________________ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com