On 06/02/2017 02:38 AM, Antoine Pitrou wrote:
I hope those are not the actual numbers you're intending to use ;-)
I still think that allocating more than 1 or 2MB at once would be
foolish.  Remember this is data that's going to be carved up into
(tens of) thousands of small objects.  Large objects eschew the small
object allocator (not to mention that third-party libraries like Numpy
may be using different allocation routines when they allocate very
large data).

Honest, I'm well aware of what obmalloc does and how it works. I bet I've spent more time crawling around in it in the last year than anybody else on the planet. Mainly because it works so well for CPython, nobody else needed to bother!

I'm also aware, for example, that if your process grows to consume gigabytes of memory, you're going to have tens of thousands of allocated arenas. The idea that on systems with gigabytes of memory--90%+? of current systems running CPython--we should allocate memory forever in 256kb chunks is faintly ridiculous. I agree that we should start small, and ramp up slowly, so Python continues to run well on small computers and not allocate tons of memory for small programs. But I also think we should ramp up *ever*, for programs that use tens or hundreds of megabytes.

Also note that if we don't touch the allocated memory, smart modern OSes won't actually commit any resources to it. All that happens when your process allocates 1GB is that the OS changes some integers around. It doesn't actually commit any memory to your process until you attempt to write to that memory, at which point it gets mapped in in local-page-size chunks (4k? 8k? something in that neighborhood and power-of-2 sized). So if we allocate 32mb, and only touch the first 1mb, the other 31mb doesn't consume any real resources. I was planning on making the multi-arena code only touch memory when it actually needs to, similarly to the way obmalloc lazily consumes memory inside an allocated pool (see the nextoffset field in pool_header), to take advantage of this ubiquitous behavior.


If I write this multi-arena code, which I might, I was thinking I'd try this approach:

 * leave arenas themselves at 256k
 * start with a 1MB multi-arena size
 * every time I allocate a new multi-arena, multiply the size of the
   next multi-arena by 1.5 (rounding up to 256k each time)
 * every time I free a multi-arena, divide the size of the next
   multi-arena by 2 (rounding up to 256k each time)
 * if allocation of a multi-arena fails, use a binary search algorithm
   to allocate the largest multi-arena possible (rounding up to 256k at
   each step)
 * cap the size of multi arenas at, let's say, 32mb

So multi-arenas would be 1mb, 1.5mb, 2.25mb, 3.5mb (round up!), etc.


Fun fact: Python allocates 16 arenas at the start of the program, just to initialize obmalloc. That consumes 4mb of memory. With the above multi-arena approach, that'd allocate the first three multi-arenas, pre-allocating 19 arenas, leaving 3 unused. It's *mildly* tempting to make the first multi-arena be 4mb, just so this is exactly right-sized, but... naah.


//arry/
_______________________________________________
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Reply via email to