[Python-Dev] Re: radix tree arena map for obmalloc

Tim Peters Tue, 18 Jun 2019 19:20:29 -0700

[Tim]
> - For truly effective RAM releasing, we would almost certainly need to
> make major changes, to release RAM at an OS page level.   256K arenas
> were already too fat a granularity.


We can approximate that closely right now by using 4K pools _and_ 4K
arenas:  one pool per arena, and mmap()/munmap() are then called on
one page at a time.

[Don't try this at home ;-)  There are subtle assumptions in the code
that there are at least two pools in an arena, and those have to be
overcome first.]

For memcrunch.py, using 200x the original initial objects, this works
quite well!  Note that this still uses our current release-arenas
heuristic:  the only substantive change from the status quo is setting
ARENA_SIZE to POOL_SIZE (both 4 KiB - one page).

# arenas allocated total           =              873,034
# arenas reclaimed                 =              344,380
# arenas highwater mark            =              867,540
# arenas allocated current         =              528,654
528654 arenas * 4096 bytes/arena   =        2,165,366,784

# bytes in allocated blocks        =        1,968,234,336
# bytes in available blocks        =          141,719,280
5349 unused pools * 4096 bytes     =           21,909,504
# bytes lost to pool headers       =           25,118,640
# bytes lost to quantization       =            8,385,024
# bytes lost to arena alignment    =                    0
Total                              =        2,165,366,784

So, at the end, space utilization is over 90%:

1,968,234,336 / 2,165,366,784 = 0.90896117

OTOH, an even nastier version of the other program I posted isn't
helped much at all, ending like so after phase 10:

# arenas allocated total           =            1,025,106
# arenas reclaimed                 =               30,539
# arenas highwater mark            =            1,025,098
# arenas allocated current         =              994,567
994567 arenas * 4096 bytes/arena   =        4,073,746,432

# bytes in allocated blocks        =          232,861,440
# bytes in available blocks        =        2,064,665,008
424741 unused pools * 4096 bytes   =        1,739,739,136
# bytes lost to pool headers       =           27,351,648
# bytes lost to quantization       =            9,129,200
# bytes lost to arena alignment    =                    0
Total                              =        4,073,746,432

So space utilization is under 6%:

232,861,440 / 4,073,746,432 = 0.0571615

Believe it or not, that's slightly (but _only_ slightly) better than
when using the current 256K/4K arena/pool mix, which released no
arenas at all and ended with

232,861,440 / 4,199,022,592 = 0.05545611

utilization.

So:

- There's substantial room for improvement in releasing RAM by
tracking it at OS page level.

- But the current code design is (very!) poorly suited for that.

- In some non-contrived cases it wouldn't really help anyway.

A natural question is how much arena size affects final space
utilization for memcrunch.py.  Every successive increase over one pool
hurts, but eventually it stops mattering much.  Here are the possible
power-of-2 arena sizes, using 4K pools, ending with the smallest for
which no arenas get reclaimed:

1,968,234,336 / 2,165,366,784 = 0.90896117
528654 arenas * 4096 bytes/arena   =        2,165,366,784
# bytes in allocated blocks        =        1,968,234,336

1,968,234,336 / 2,265,399,296 = 0.86882447
276538 arenas * 8192 bytes/arena   =        2,265,399,296
# bytes in allocated blocks        =        1,968,234,336

1,968,235,360 / 2,441,314,304 = 0.80621957
149006 arenas * 16384 bytes/arena  =        2,441,314,304
# bytes in allocated blocks        =        1,968,235,360

1,968,235,360 / 2,623,799,296 = 0.75014707
80072 arenas * 32768 bytes/arena   =        2,623,799,296
# bytes in allocated blocks        =        1,968,235,360

1,968,235,360 / 2,924,216,320 = 0.67308131
44620 arenas * 65536 bytes/arena   =        2,924,216,320
# bytes in allocated blocks        =        1,968,235,360

1,968,235,360 / 3,299,475,456 = 0.59652978
25173 arenas * 131072 bytes/arena  =        3,299,475,456
# bytes in allocated blocks        =        1,968,235,360

1,968,235,360 / 3,505,913,856 = 0.56140437
13374 arenas * 262144 bytes/arena  =        3,505,913,856
# bytes in allocated blocks        =        1,968,235,360

1,968,235,360 / 3,552,051,200 = 0.55411233
6775 arenas * 524288 bytes/arena   =        3,552,051,200
# bytes in allocated blocks        =        1,968,235,360

1,968,235,360 / 3,553,624,064 = 0.55386707
3389 arenas * 1048576 bytes/arena  =        3,553,624,064
# bytes in allocated blocks        =        1,968,235,360

Most of the damage was done by the time we reached 128K arenas, and
"almost all" when reaching 256K.

I expect that's why I'm not seeing much of any effect (on arena
recycling effectiveness) moving from the current 256K/4K to the PR's
1M/16K.  256K/4K already required "friendly" allocation/deallocation
patterns for the status quo to do real good, and 256K already requires
"friendly indeed" ;-)
_______________________________________________
Python-Dev mailing list -- [email protected]
To unsubscribe send an email to [email protected]
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/[email protected]/message/QAMNPHJ3KDSAOZ5GM5AZ7TYIE4T2LKHZ/

[Python-Dev] Re: radix tree arena map for obmalloc

Reply via email to