To be clearer, while knowing the size of allocated objects may be of
some use to some other allocators, "not really" for obmalloc.  That
one does small objects by itself in a uniform way, and punts
everything else to the system malloc family.  The _only_ thing it
wants to know on a free/realloc is which of those two ways delivered
the address to begin with.

Knowing the original size would be an indirect way to accomplish that,
but it really only needs 1 bit of info.  If you're using a radix tree
to implement - in effect - a dict mapping addresses to "stuff", 1 flag
bit in the "stuff" would be ideal for that.  If you haven't already, I
assume you'll soon come around to believe you really should be
tracking the addresses of all gc'ed objects (not just the "small"
ones).

> If the above idea works, we know the object size at free() and realloc(), we 
> don't
> need address_in_range() for those code paths.

Two things:  first, a single bit would not only be sufficient, it
would be _better_ than knowing the object size.  If the bit says the
object came from the system, the size is of no use at all.  It the bit
says it came from obmalloc, what's _really_ wanted is the index of the
object's "size class" in the vector of free lists, and that's already
directly available in the object's pool header (various parts of which
have to be read up &/or updated regardless).  Second, if that bit were
available, `address_in_range()` could be thrown away - obmalloc's free
and realloc entries are the only places it's used (or is of any
conceivable use to obmalloc).

For the current obmalloc, I have in mind a different way (briefly. let
s pool span a number of 4K pages, but teach pools about the page
structure so that address_in_range() continues to work after trivial
changes (read the arena index from the containing page's start rather
than from the containing pool's)).  Not ideal, but looks remarkably
easy to do, and captures the important part (more objects in a pool ->
more times obmalloc can remain in its fastest "all within the pool"
paths).

> My gut feeling is that the prev/next pointer updates done by
> move_unreachable() and similar functions must be quite expensive.
> Doing the traversal with an explicit stack is a lot less elegant but
> I think it should be faster.  At least, when you are dealing with a
> big set of GC objects that don't fit in the CPU cache.

At the start, it was the _reachable_ objects that were moved out of
the collected generation list rather than the unreachable objects.
Since it's (in all my experience) almost all objects that survive
almost all collections in almost all programs, that was an enormous
amount of moving overall.  So long I ago I rewrote that code to move
the _un_reachable objects instead.

Debug output showed countless billions of pointer updates saved across
the programs I tried at the time, but the timing difference was in the
noise.

Probably a big part of "the problem":  when collecting the oldest
generation in a program with "a lot" of objects, no approach at all
will "fit in cache".  We have to crawl the entire object graph then.
Note that the test case in the parent thread had over a billion class
instances, and it was way whittled down(!) from the OP's real program.

But I don't _think_ that's typical quite yet ;-) , and:

> You are likely correct. I'm hoping to benchmark the radix tree idea.
> I'm not too far from having it working such that it can replace
> address_in_range().  Maybe allocating gc_refs as a block would
> offset the radix tree cost vs address_in_range().

I certainly agree gc_refs is the big-bang-for-the-buck thing to
target.  There's no way to avoid mutating that bunches and bunches of
times, even in small programs, and reducing the number of dirtied
cache lines due to that "should" pay off.
_______________________________________________
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/3DPAR4PPPM3Z675ELBGYIJ74INI6PL3X/

Reply via email to