Hi, On 2023-02-17 17:26:20 +1300, David Rowley wrote: > I didn't hear it mentioned explicitly here, but I suspect it's faster > when increasing the initial size due to the memory context caching > code that reuses aset MemoryContexts (see context_freelists[] in > aset.c). Since we reset the context before caching it, then it'll > remain fast when we can reuse a context, provided we don't need to do > a malloc for an additional block beyond the initial block that's kept > in the cache.
I'm not so sure this is the case. Which is one of the reasons I'd really like to see a) memory context stats for executor context b) a CPU profile of the problem c) a reproducer. Jonah, did you just increase the initial size, or did you potentially also increase the maximum block size? And did you increase ALLOCSET_DEFAULT_INITSIZE everywhere, or just passed a larger block size in CreateExecutorState()? If the latter,the context freelist wouldn't even come into play. A 8MB max block size is pretty darn small if you have a workload that ends up with a gigabytes worth of blocks. And the problem also could just be that the default initial blocks size takes too long to ramp up to a reasonable block size. I think it's 20 blocks to get from ALLOCSET_DEFAULT_INITSIZE to ALLOCSET_DEFAULT_MAXSIZE. Even if you allocate a good bit more than 8MB, having to additionally go through 20 smaller chunks is going to be noticable until you reach a good bit higher number of blocks. > Maybe we should think of a more general-purpose way of doing this > caching which just keeps a global-to-the-process dclist of blocks > laying around. We could see if we have any free blocks both when > creating the context and also when we need to allocate another block. Not so sure about that. I suspect the problem could just as well be the maximum block size, leading to too many blocks being allocated. Perhaps we should scale that to a certain fraction of work_mem, by default? Either way, I don't think we should go too deep without some data, too likely to miss the actual problem. > I see no reason why this couldn't be shared among the other context > types rather than keeping this cache stuff specific to aset.c. slab.c > might need to be pickier if the size isn't exactly what it needs, but > generation.c should be able to make use of it the same as aset.c > could. I'm unsure what'd we'd need in the way of size classing for > this, but I suspect we'd need to pay attention to that rather than do > things like hand over 16MBs of memory to some context that only wants > a 1KB initial block. Possible. I can see something like a generic "free block" allocator being useful. Potentially with allocating the underlying memory with larger mmap()s than we need for individual blocks. Random note: I wonder if we should having a bitmap (in an int) in front of aset's freelist. In a lot of cases we incur plenty cache misses, just to find the freelist bucket empty. Greetings, Andres Freund