[Antoine Pitrou <solip...@pitrou.net>] > For the record, there's another contender in the allocator > competition now: > https://github.com/microsoft/mimalloc/
Thanks! From a quick skim, most of it is addressing things obmalloc doesn't: 1) Efficient thread safety (we rely on the GIL). 2) Directly handling requests of all sizes (we only deal with <= 512 bytes). 3) Funky stuff to help refcounting languages that want to guarantee that freeing an object takes bounded time (the problem is that large objects can contain an unbounded number of other objects that will then also die - so mimalloc has complications to interact with client-supplied functions that parcel out the tree of objects killed by a decref, so the freeing can be done incrementally over time). I don't think it's been explicitly pointed out before, but #2 is an alternative to my PR and Neil's radix tree. If we insist that programs play by the rules, and obmalloc controls every piece of memory it hands out, then the new definition of address_in_range() becomes #define address_in_range(p, pool) true ;-) This deserves more thought than I've ever given to it. Leaving aside all the above, they emphasize this: """ Many allocators use a single free list per size class which can lead to bad spatial locality where objects belonging to a single structure can be spread out over the entire heap. ... To improve the spatial locality of allocation, mimalloc use free list sharding where the heap is divided into pages (per size-class) with a free list per page (where pages are usually 64KiB for small objects). """ Which is exactly what obmalloc already does, except we call their "pages" "pools", and ours are 16x smaller (<= Python 3.8) or 4x smaller (my PR). The call our "arenas" "segments", and theirs are a minimum of 4 MiB. Their motivations for making these "large" are the same as mine: maximize the time they can stay in the very zippy "fast paths". Something I'm torn on: within a pool, obmalloc has two ways to find the next free block. The pool has its own block free list (as in mimalloc), but _also_ has a "bump allocator": The pool is carved into blocks one at a time, low address to high, whenever the pool's free list is NULL. This was part of Vladimir's desire to never touch a piece of memory before it's actually needed to satisfy a client request. But it does complicate and slow the code on the second-fastest allocation path (when the pool's free list _is_ NULL). The conditional "OK, the free list is NULL - so is there room for another bump allocation?" is a waste of time after all the pool's blocks have been passed out at least once. The bump allocator can never succeed again then, but we keep testing for it forever. The alternative is to break the pool into blocks, and link them into a free list, in one gulp when a size class is assigned to a free pool. That mutates all the blocks in the pool (to store the list pointers), even if all but one will never be used. In return, the code for testing whether the bump allocator can find room, and the pool header members supporting the bump allocator, could be thrown out. That's what mimalloc appears to do, and they said it sped things up a bit overall. But I think I'll leave that one for someone else ;-) _______________________________________________ Python-Dev mailing list -- python-dev@python.org To unsubscribe send an email to python-dev-le...@python.org https://mail.python.org/mailman3/lists/python-dev.python.org/ Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/RSBCIDD6YBDSEPQIBGTOZHVS63PS7LTU/