[Antoine Pitrou <solip...@pitrou.net>]
> For the record, there's another contender in the allocator
> competition now:
> https://github.com/microsoft/mimalloc/

Thanks!  From a quick skim, most of it is addressing things obmalloc doesn't:

1) Efficient thread safety (we rely on the GIL).

2) Directly handling requests of all sizes (we only deal with <= 512 bytes).

3) Funky stuff to help refcounting languages that want to guarantee
   that freeing an object takes bounded time (the problem is that large
   objects can contain an unbounded number of other objects that
   will then also die - so mimalloc has complications to interact with
   client-supplied functions that parcel out the tree of objects killed
   by a decref, so the freeing can be done incrementally over time).

I don't think it's been explicitly pointed out before, but #2 is an
alternative to my PR and Neil's radix tree.  If we insist that
programs play by the rules, and obmalloc controls every piece of
memory it hands out, then the new definition of address_in_range()
becomes

#define address_in_range(p, pool) true

;-)

This deserves more thought than I've ever given to it.

Leaving aside all the above, they emphasize this:

"""
Many allocators use a single free list per size class which can lead
to bad spatial locality where objects belonging to a single structure
can be spread out over the entire heap.
...
To improve the spatial locality of allocation, mimalloc use free list
sharding where the heap is divided into pages (per size-class) with a
free list per page (where pages are usually 64KiB for small objects).
"""

Which is exactly what obmalloc already does, except we call their
"pages" "pools", and ours are 16x smaller (<= Python 3.8) or 4x
smaller (my PR).

The call our "arenas" "segments", and theirs are a minimum of 4 MiB.

Their motivations for making these "large" are the same as mine:
maximize the time they can stay in the very zippy "fast paths".

Something I'm torn on:  within a pool, obmalloc has two ways to find
the next free block.  The pool has its own block free list (as in
mimalloc), but _also_ has a "bump allocator":  The pool is carved into
blocks one at a time, low address to high, whenever the pool's free
list is NULL.  This was part of Vladimir's desire to never touch a
piece of memory before it's actually needed to satisfy a client
request.

But it does complicate and slow the code on the second-fastest
allocation path (when the pool's free list _is_ NULL).  The
conditional "OK, the free list is NULL - so is there room for another
bump allocation?" is a waste of time after all the pool's blocks have
been passed out at least once.  The bump allocator can never succeed
again then, but we keep testing for it forever.

The alternative is to break the pool into blocks, and link them into a
free list, in one gulp when a size class is assigned to a free pool.
That mutates all the blocks in the pool (to store the list pointers),
even if all but one will never be used.  In return, the code for
testing whether the bump allocator can find room, and the pool header
members supporting the bump allocator, could be thrown out.  That's
what mimalloc appears to do, and they said it sped things up a bit
overall.

But I think I'll leave that one for someone else ;-)
_______________________________________________
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/RSBCIDD6YBDSEPQIBGTOZHVS63PS7LTU/

Reply via email to