On 2019-06-06, Tim Peters wrote:
> Like now: if the size were passed in, obmalloc could test the size
> instead of doing the `address_in_range()` dance(*). But if it's ever
> possible that the size won't be passed in, all the machinery
> supporting `address_in_range()` still needs to be there, and every
> obmalloc spelling of malloc/realloc needs to ensure that machinery
> will work if the returned address is passed back to an obmalloc
> free/realloc spelling without the size.
We can almost make it work for GC objects, the use of obmalloc is
quite well encapsulated. I think I intentionally designed the
PyObject_GG_New/PyObject_GC_Del/etc APIs that way.
Quick and dirty experiment is here:
https://github.com/nascheme/cpython/tree/gc_malloc_free_size
The major hitch seems my new gc_obj_size() function. We can't be
sure the 'nbytes' passed to _PyObject_GC_Malloc() is the same as
what is computed by gc_obj_size(). It usually works but there are
exceptions (freelists for frame objects and tuple objects, for one)
A nasty problem is the weirdness with PyType_GenericAlloc() and the
sentinel item. _PyObject_GC_NewVar() doesn't include space for the
sentinel but PyType_GenericAlloc() does. When you get to
gc_obj_size(), you don't if you should use "nitems" or "nitems+1".
I'm not sure how the fix the sentinel issue. Maybe a new type slot
or a type flag? In any case, making a change like my git branch
above would almost certainly break extensions that don't play
nicely. It won't be hard to make it a build option, like the
original gcmodule was. Then, assuming there is a performance boost,
people can enable it if their extensions are friendly.
> The "only"problem with address_in_range is that it limits us to a
> maximum pool size of 4K. Just for fun, I boosted that to 8K to see
> how likely segfaults really are, and a Python built that way couldn't
> even get to its first prompt before dying with an access violation
> (Windows-speak for segfault).
If we can make the above idea work, you could set the pool size to
8K without issue. A possible problem is that the obmalloc and
gcmalloc arenas are separate. I suppose that affects
performance testing.
> We could eliminate the pool size restriction in many ways. For
> example, we could store the addresses obtained from the system
> malloc/realloc - but not yet freed - in a set, perhaps implemented as
> a radix tree to cut the memory burden. But digging through 3 or 4
> levels of a radix tree to determine membership is probably
> significantly slower than address_in_range.
You are likely correct. I'm hoping to benchmark the radix tree idea.
I'm not too far from having it working such that it can replace
address_in_range(). Maybe allocating gc_refs as a block would
offset the radix tree cost vs address_in_range(). If the above idea
works, we know the object size at free() and realloc(), we don't
need address_in_range() for those code paths.
Regards,
Neil
_______________________________________________
Python-Dev mailing list -- [email protected]
To unsubscribe send an email to [email protected]
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at
https://mail.python.org/archives/list/[email protected]/message/ILFK2MTCVA7GB7JGBVSUWASKJ7T4LLJE/