[Tim] >> I don't think we need to cater anymore to careless code that mixes >> system memory calls with O calls (e.g., if an extension gets memory >> via `malloc()`, it's its responsibility to call `free()`), and if not >> then `address_in_range()` isn't really necessary anymore either, and >> then we could increase the pool size. O would, however, need a new >> way to recognize when its version of malloc punted to the system >> malloc.
[Thomas Wouters <tho...@python.org>] > Is this really feasible in a world where the allocators can be selected (and > the default changed) at runtime? I think so. See the "Memory Management" section of the Python/C API Reference Manual. It's always been "forbidden" to, e.g., allocate a thing with PyMem_New() but release it with free(). Ditto mixing a PyMem_Raw... allocator with a PyMem... deallocator, or PyObject... one. Etc. A type's tp_dealloc implementation should damn well which memory family the type's allocator used, However, no actual proposal on the table changes any "fact on the ground" here. They're all as forgiving of slop as the status quo. > And what would be an efficient way of detecting allocations punted to > malloc, if not address_in_range? _The_ most efficient way is the one almost all allocators used long ago: use some "hidden" bits right before the address returned to the user to store info about the block being returned. Like 1 bit to distinguish between "obmalloc took this out of one of its pools" and "obmalloc got this from PyMem_Raw... (whatever that maps to - obmalloc doesn't care)". That would be much faster than what we do now. But on current 64-bit boxes, "1 bit" turns into "16 bytes" to maintain alignment, so space overhead becomes 100% for the smallest objects obmalloc can return :-( Neil Schemenauer takes a different approach in the recent "radix tree arena map for obmalloc" thread here. We exchanged ideas on that until it got to the point that the tree levels only need to trace out prefixes of obmalloc arena addresses. That is, the new space burden of the radix tree appears quite reasonably small. It doesn't appear to be possible to make it faster than the current address_in_range(), but in small-scale testing so far speed appears comparable. > Getting rid of address_in_range sounds like a nice idea, and I would love to > test > how feasible it is -- I can run such a change against a wide selection of code > at work, including a lot of third-party extension modules, but I don't see an > easy > way to do it right now. Neil's branch is here: https://github.com/nascheme/cpython/tree/obmalloc_radix_tree It's effectively a different _implementation_ of the current address_in_range(), one that doesn't ever need to read possibly uninitialized memory, and couldn't care less about the OS page size. For the latter reason, it's by far the clearest way to enable expanding pool size above 4 KiB. My PR also eliminates the pool size limitation: https://github.com/python/cpython/pull/13934 but at the cost of breaking bigger pools up internally into 4K regions so the excruciating current address_in_range black magic still works. Neil and I are both keen _mostly_ to increase pool and arena sizes. The bigger they are, the more time obmalloc can spend in its fastest code paths. A question we can't answer yet (or possibly ever) is how badly that would hurt Python returning arenas to the system, in long-running apps the go through phases of low and high memory need. I don't run anything like that - not a world I've ever lived in. All my experiments so far say, for programs that are neither horrible nor wonderful in this respect: 1. An arena size of 4 KiB is most effective for that. 2. There's significant degradation in moving even to 8 KiB arenas. 3. Which continues getting worse the larger the arenas. 4. Until reaching 128 KiB, at which point the rate of degradation falls a lot. So the current 256 KiB arenas already suck for such programs. For "horrible" programs, not even tiny 4K arenas help much. For "wonderful" programs, not even 16 MiB arenas hurt arena recycling effectiveness. So if you have real programs keen to "return memory to the system" periodically, it would be terrific to get info about how changing arena size affects their behavior in that respect. My PR uses 16K pools and 1M arenas, quadrupling the status quo. Because "why not?" ;-) Neil's branch has _generally_, but not always, used 16 MiB arenas. The larger the arenas in his branch, the smaller the radix tree needs to grow. _______________________________________________ Python-Dev mailing list -- python-dev@python.org To unsubscribe send an email to python-dev-le...@python.org https://mail.python.org/mailman3/lists/python-dev.python.org/ Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/7ZIFV2BEL64FQGC35F7QUPK3SHVR3VGT/