[Antoine Pitrou <solip...@pitrou.net>]
> The interesting thing here is that in many situations, the size is
> known up front when deallocating - it is simply not communicated to the
> deallocator because the traditional free() API takes a sole pointer,
> not a size.  But CPython could communicate that size easily if we
> would like to change the deallocation API.  Then there's no bother
> looking up the allocated size in sophisticated lookup structures.

That could work (to make it possible to increase obmalloc's pool
size).  Except ...

> I'll note that jemalloc provides such APIs:
> http://jemalloc.net/jemalloc.3.html
>
> """The dallocx() function causes the memory referenced by ptr to be
> made available for future allocations.
>
> The sdallocx() function is an extension of dallocx() with a size
> parameter to allow the caller to pass in the allocation size as an
> optimization."""

obmalloc doesn't intend to be a general-purpose allocator - it only
aims at optimizing "small" allocations, punting to the system for
everything beyond that.  Unless the size is _always_ passed in (on
every free() and realloc() spelling it supports), an "optimization"
doesn't help it much.  It needs a bulletproof way to determine whether
it, or system malloc/realloc, originally obtained an address passed
in.  If the size is always passed in, no problem (indeed, a single bit
could suffice).  But if it's ever possible that the size won't be
passed in, all the runtime machinery to figure that out on its own
needs to be able to handle all addresses.

Like now:  if the size were passed in, obmalloc could test the size
instead of doing the `address_in_range()` dance(*).  But if it's ever
possible that the size won't be passed in, all the machinery
supporting `address_in_range()` still needs to be there, and every
obmalloc spelling of malloc/realloc needs to ensure that machinery
will work if the returned address is passed back to an obmalloc
free/realloc spelling without the size.

The "only"problem with address_in_range is that it limits us to a
maximum pool size of 4K.  Just for fun, I boosted that to 8K to see
how likely segfaults really are, and a Python built that way couldn't
even get to its first prompt before dying with an access violation
(Windows-speak for segfault).

Alas, that makes it hard to guess how much value there would be for
Python programs if the pool size could be increased - can't even get
Python started.

We could eliminate the pool size restriction in many ways.  For
example, we could store the addresses obtained from the system
malloc/realloc - but not yet freed - in a set, perhaps implemented as
a radix tree to cut the memory burden.  But digging through 3 or 4
levels of a radix tree to determine membership is probably
significantly slower than address_in_range.

I can think of a way to do it slightly faster than (but related to)
address_in_range, but it would (on a 64-bit box) require adding 24
more bytes for each system-malloc/realloc allocation.  8 of those
bytes would be pure waste, due to that the Alignment Gods appear to
require 16-byte alignment for every allocation on a 64-bit box now.

In stark contrast, the extra memory burden of the current
address_in_range is an insignificant 8 bytes per _arena_ (256 KB, and
"should be" larger now).

Another approach:  keep address_as_range as-is, but add new internal
structure to larger pools, to repeat the arena index every 4KB.  But
that fights somewhat against the goal of larger pools.

Etc. ;-)

(*) Actually not quite true.  If a "large" object is obtained from
obmalloc now (meaning it actually came from the system malloc), then
cut back to a "small" size by a realloc, it _remains_ under the
control of the system malloc now.  Passing in the newer "small" size
to a free() later would cause obmalloc to get fatally confused about
that.
_______________________________________________
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/

Reply via email to