[Jack Diederich] >> PyObject_MALLOC does a good job of reusing small allocations but it >> can't quite manage the same speed as a free list, especially for things that >> have some extra setup involved (tuples have a free list for each length).
[Martin v. Löwis] > I would question that statement, for any practical purposed. The cost of > tuple comes from setting the elements to NULL, and that has to be done > regardless of whether they were allocated new or came from the list. Except _that_ overhead is trivial for small tuples, and small tuples are the only kind the free lists cache. There are many other overheads. If a tuple is taken off a free list, we get to skip integer multiplication and division checking for overflow before calling PyObject_GC_NewVar. We also get to skip the call to PyObject_GC_NewVar. That in turns skips another integer multiplication in the _PyObject_VAR_SIZE macro, and a call to _PyObject_GC_Malloc. That it turn skips a call to PyObject_MALLOC, and conditionals checking whether it's time to trigger a gc collection. All of that is highly significant compared to the cost of setting at most a handful of slots to NULL inline. > Likewise, the GC management has to be done regardless. _PyObject_GC_TRACK expands to 5 inlined simple stores, and a predictable branch, so it is often more expensive than setting the tuple slots to NULL. But, as above, we get to skip three layers of function call and "will it overflow?" arithmetic in the service of _setting up_ an object for gc initially. Only the gc track/untrack pair remains for tuples in a free list. > So I expect that the speedup is rather minor, and not worth it. Depends on the app :-) Here's a test case that gets supernatural benefit from small-tuple caching: """ def doit(): N1000 = [None] * 1000 basetup = (5,) for i in N1000: tups = [] push = tups.append for j in xrange(10): for k in N1000: push(basetup * j) from time import clock as now times = [] for i in range(3): start = now() doit() finish = now() times.append(finish - start) print sorted(times) """ With current trunk that printed [2.9363677646013846, 2.9489729031005703, 2.9689538729183949] After changing #define MAXSAVEDTUPLES 2000 to #define MAXSAVEDTUPLES 0 the times zoomed to [4.5894824930441587, 4.6023111649343242, 4.629560027293957] That's pretty dramatic. OTOH, I don't have any apps that do that <0.5 wink>, and there's another downside: on SF recently, someone complained that the 2.5 obmalloc work to release unused arenas wasn't doing much in his (perhaps equally artificial -- don't know) test case. It surprised me too, so I dug into it. The problem turned out to be that piles of arenas were being kept "artificially" alive because obmalloc was the original source of thousands of tuple objects being kept alive (from obmalloc's POV) in tupleobject.c's free lists. If you're unlucky, it only takes one tiny tuple in one free list to keep an entire 256KB arena alive -- and if you're very unlucky, you manage to allocate objects in such a way that this happens repeatedly. His test also created lots of dicts along the way, and each arena got mostly filled up with dict objects and a relative handful of small tuples. By the time all of this became trash, the tuples were spread over a few hundred areans, which effectively became immortal. While it doesn't make any real sense, I've seen repeatedly that users _try_ calling gc.collect() in such cases (doesn't make sense because it has nothing to do with cyclic gc). But that suggests it could be a _pragmatic_ win to add an internal "clear all known free lists" function, which gc could call at times, or even just be forced by the user via a `gc` entry point or gc.collect() option. The immortal & unbounded int and float free lists don't cause obmalloc arenas to stay alive unduly, because they get their memory in chunks straight from the system malloc(). But they have to be the cause of the most frequent source of complaints from newbies and Zope users ;-), who do things like range(10000000) and then marvel that some 120MB of VM is still being used after nuking the list and doing a gc.collect(). I get weary of explaining that one :-(. _______________________________________________ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com