Hi, On 2021-07-17 12:43:33 -0700, Andres Freund wrote: > 2) SlabChunkIndex() in SlabFree() is slow. It requires a 64bit division, > taking > up ~50% of the cycles in SlabFree(). A 64bit div, according to [1] , has a > latency of 35-88 cycles on skylake-x (and a reverse throughput of 21-83, > i.e. no parallelism). While it's getting a bit faster on icelake / zen 3, it's > still slow enough there to be very worrisome. > > I don't see a way to get around the division while keeping the freelist > structure as is. But: > > ISTM that we only need the index because of the free-chunk list, right? Why > don't we make the chunk list use actual pointers? Is it concern that that'd > increase the minimum allocation size? If so, I see two ways around that: > First, we could make the index just the offset from the start of the block, > that's much cheaper to calculate. Second, we could store the next pointer in > SlabChunk->slab/block instead (making it a union) - while on the freelist we > don't need to dereference those, right? > > I suspect both would also make the block initialization a bit cheaper. > > That should also accelerate SlabBlockGetChunk(), which currently shows up as > an imul, which isn't exactly fast either (and uses a lot of execution ports).
Oh - I just saw that effectively the allocation size already is a uintptr_t at minimum. I had only seen /* Make sure the linked list node fits inside a freed chunk */ if (chunkSize < sizeof(int)) chunkSize = sizeof(int); but it's followed by /* chunk, including SLAB header (both addresses nicely aligned) */ fullChunkSize = sizeof(SlabChunk) + MAXALIGN(chunkSize); which means we are reserving enough space for a pointer on just about any platform already? Seems we can just make that official and reserve space for a pointer as part of the chunk size rounding up, instead of fullChunkSize? Greetings, Andres Freund