On 02/14/2017 03:22 AM, Andres Freund wrote:
On 2017-02-11 14:40:18 +0100, Tomas Vondra wrote:
On 02/11/2017 02:33 AM, Andres Freund wrote:
I have a hard time believing this the cache efficiency of linked lists
(which may or may not be real in this case) out-weights this, but if you
want to try, be my guest.
I'm not following - why would there be overhead in anything for
allocations bigger than 4 (or maybe 8) bytes? You can store the list
(via chunk ids, not pointers) inside the chunks itself, where data
otherwise would be. And I don't see why you'd need a doubly linked
list, as the only operations that are needed are to push to the front of
the list, and to pop from the front of the list - and both operations
are simple to do with a singly linked list?
Oh! I have not considered storing the chunk indexes (for linked lists) in
the chunk itself, which obviously eliminates the overhead concerns, and
you're right a singly-linked list should be enough.
There will be some minimum-chunk-size requirement, depending on the block
size/chunk size. I wonder whether it makes sense to try to be smart and make
it dynamic, so that we only require 1B or 2B for cases when only that many
chunks fit into a block, or just say that it's 4B and be done with it.
I doubt it's worth it - it seems likely that the added branch is more
noticeable overall than the possible savings of 3 bytes. Also, won't the
space be lost due to alignment *anyway*?
+ /* chunk, including SLAB header (both addresses nicely aligned) */
+ fullChunkSize = MAXALIGN(sizeof(SlabChunkData) + MAXALIGN(chunkSize));
In that case I'd just Assert(MAXIMUM_ALIGNOF >= sizeof(slist_head)) and
use a plain slist - no point in being more careful than that.
Hmm, I think you're right.
I mean 2^32 chunks ought to be enough for anyone, right?
Yea, that seems enough; but given the alignment thing pointed out above,
I think we can just use plain pointers - and that definitely should be
People in year 2078: Why the hell did they only use 32 bits? Wasn't it
obvious we'll have tiny computers with 32EB of RAM? ;-)
I'm still not buying the cache efficiency argument, though. One of
the reasons is that the implementation prefers blocks with fewer
free chunks when handling palloc(), so pfree() is making the block
less likely to be chosen by the next palloc().
That'll possibly de-optimize L1, but for L2 usage the higher density
seems like it'll be a win. All this memory is only accessed by a
single backend, so packing as densely as possible is good.
If so, if you have e.g. 8 byte allocations and 64kb sized blocks,
you end up with an array of 1024 doubly linked lists, which'll
take up 64kb on its own. And there a certainly scenarios where
even bigger block sizes could make sense. That's both memory
overhead, and runtime overhead, because at reset-time we'll have
to check the whole array (which'll presumably largely be empty
lists). Resetting is a pretty common path...
True, but it's not entirely clear if resetting is common for the
paths where we use those new allocators.
That's fair enough. There's still the memory overhead, but I guess
we can also live with that.
Right. My ambition was not to develop another general-purpose memory
context that would work perfectly for everything, but something that
works (better than the current code) for places like reorderbuffer.
Also, if we accept that it might be a problem, what other solution you
propose? I don't think just merging everything into a single list is a good
idea, for the reasons I explained before (it might make the resets somewhat
less expensive, but it'll make pfree() more expensive).
Now that I think about it, a binary heap, as suggested elsewhere, isn't
entirely trivial to use for this - it's more or less trivial to "fix"
the heap after changing an element's value, but it's harder to find that
But a two-level list approach seems like it could work quite well -
basically a simplified skip-list. A top-level list contains that has
the property that all the elements have a distinct #free, and then
hanging of those sub-lists for all the other blocks with the same number
I think that'd be a better implementation, but I can understand if you
don't immediately want to go there.
I don't want to go there. I'm not all that interested in reorderbuffer,
to be honest, and this started more as "Hold my beer!" hack, after a
midnight discussion with Petr, than a seriously meant patch. I've
already spent like 100x time on it than I expected.
What might work is replacing the array with a list, though. So we'd have a
list of lists, which would eliminate the array overhead.
That seems likely to be significantly worse, because a) iteration is
more expensive b) accessing the relevant list to move between two
different "freecount" lists would be O(n).
Oh, right, I haven't realized we won't know the current head of the
list, so we'd have to search for it. OTOH, we could replace it with a
small hash table, which would reduce the lookup time because we'd have
to search only in a single bin.
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Sent via pgsql-hackers mailing list (email@example.com)
To make changes to your subscription: