Hi,

On 2021-07-17 12:43:33 -0700, Andres Freund wrote:
> 2) SlabChunkIndex() in SlabFree() is slow. It requires a 64bit division, 
> taking
> up ~50% of the cycles in SlabFree(). A 64bit div, according to [1] , has a
> latency of 35-88 cycles on skylake-x (and a reverse throughput of 21-83,
> i.e. no parallelism). While it's getting a bit faster on icelake / zen 3, it's
> still slow enough there to be very worrisome.
> 
> I don't see a way to get around the division while keeping the freelist
> structure as is. But:
> 
> ISTM that we only need the index because of the free-chunk list, right? Why
> don't we make the chunk list use actual pointers? Is it concern that that'd
> increase the minimum allocation size?  If so, I see two ways around that:
> First, we could make the index just the offset from the start of the block,
> that's much cheaper to calculate. Second, we could store the next pointer in
> SlabChunk->slab/block instead (making it a union) - while on the freelist we
> don't need to dereference those, right?
> 
> I suspect both would also make the block initialization a bit cheaper.
> 
> That should also accelerate SlabBlockGetChunk(), which currently shows up as
> an imul, which isn't exactly fast either (and uses a lot of execution ports).

Oh - I just saw that effectively the allocation size already is a
uintptr_t at minimum. I had only seen

        /* Make sure the linked list node fits inside a freed chunk */
        if (chunkSize < sizeof(int))
                chunkSize = sizeof(int);
but it's followed by
        /* chunk, including SLAB header (both addresses nicely aligned) */
        fullChunkSize = sizeof(SlabChunk) + MAXALIGN(chunkSize);

which means we are reserving enough space for a pointer on just about
any platform already? Seems we can just make that official and reserve
space for a pointer as part of the chunk size rounding up, instead of
fullChunkSize?

Greetings,

Andres Freund


Reply via email to