[Antoine Pitrou <solip...@pitrou.net>] > The interesting thing here is that in many situations, the size is > known up front when deallocating - it is simply not communicated to the > deallocator because the traditional free() API takes a sole pointer, > not a size. But CPython could communicate that size easily if we > would like to change the deallocation API. Then there's no bother > looking up the allocated size in sophisticated lookup structures.
That could work (to make it possible to increase obmalloc's pool size). Except ... > I'll note that jemalloc provides such APIs: > http://jemalloc.net/jemalloc.3.html > > """The dallocx() function causes the memory referenced by ptr to be > made available for future allocations. > > The sdallocx() function is an extension of dallocx() with a size > parameter to allow the caller to pass in the allocation size as an > optimization.""" obmalloc doesn't intend to be a general-purpose allocator - it only aims at optimizing "small" allocations, punting to the system for everything beyond that. Unless the size is _always_ passed in (on every free() and realloc() spelling it supports), an "optimization" doesn't help it much. It needs a bulletproof way to determine whether it, or system malloc/realloc, originally obtained an address passed in. If the size is always passed in, no problem (indeed, a single bit could suffice). But if it's ever possible that the size won't be passed in, all the runtime machinery to figure that out on its own needs to be able to handle all addresses. Like now: if the size were passed in, obmalloc could test the size instead of doing the `address_in_range()` dance(*). But if it's ever possible that the size won't be passed in, all the machinery supporting `address_in_range()` still needs to be there, and every obmalloc spelling of malloc/realloc needs to ensure that machinery will work if the returned address is passed back to an obmalloc free/realloc spelling without the size. The "only"problem with address_in_range is that it limits us to a maximum pool size of 4K. Just for fun, I boosted that to 8K to see how likely segfaults really are, and a Python built that way couldn't even get to its first prompt before dying with an access violation (Windows-speak for segfault). Alas, that makes it hard to guess how much value there would be for Python programs if the pool size could be increased - can't even get Python started. We could eliminate the pool size restriction in many ways. For example, we could store the addresses obtained from the system malloc/realloc - but not yet freed - in a set, perhaps implemented as a radix tree to cut the memory burden. But digging through 3 or 4 levels of a radix tree to determine membership is probably significantly slower than address_in_range. I can think of a way to do it slightly faster than (but related to) address_in_range, but it would (on a 64-bit box) require adding 24 more bytes for each system-malloc/realloc allocation. 8 of those bytes would be pure waste, due to that the Alignment Gods appear to require 16-byte alignment for every allocation on a 64-bit box now. In stark contrast, the extra memory burden of the current address_in_range is an insignificant 8 bytes per _arena_ (256 KB, and "should be" larger now). Another approach: keep address_as_range as-is, but add new internal structure to larger pools, to repeat the arena index every 4KB. But that fights somewhat against the goal of larger pools. Etc. ;-) (*) Actually not quite true. If a "large" object is obtained from obmalloc now (meaning it actually came from the system malloc), then cut back to a "small" size by a realloc, it _remains_ under the control of the system malloc now. Passing in the newer "small" size to a free() later would cause obmalloc to get fatally confused about that. _______________________________________________ Python-Dev mailing list -- python-dev@python.org To unsubscribe send an email to python-dev-le...@python.org https://mail.python.org/mailman3/lists/python-dev.python.org/