On Wed, 2020-03-25 at 15:09 -0700, Andres Freund wrote: > > The overhead comes from two places: (a) the chunk header; and (b) > > the > > round-up-to-nearest-power-of-two behavior. > > > > Combining the group tuple and per-group states only saves the > > overhead > > from (a); it does nothing to help (b), which is often bigger. > > Hm? It very well can help with b), since the round-up only happens > once > now? I guess you could argue that it's possible that afterwards we'd > more likely to end in a bigger size class, and thus have roughly the > same amount of waste due rounding? But I don't think that's all that > convincing.
Why is that not convincing? Each size class is double the previous one, so piling double the memory into a single allocation doesn't help at all. Two palloc(20)s turn into two 32-byte chunks; one palloc(40) turns into a 64-byte chunk. You might get lucky and the second chunk will fit in the wasted space from the first chunk; but when it does cross a boundary, it will be a bigger boundary and wipe out any efficiencies that you gained previously. Of course it depends on the exact distribution. But I don't see any reason why we'd expect a distribution that would be favorable to combining chunks together (except to avoid the chunk header, problem (a)). > I still, as I mentioned on the call, suspect that the right thing > here > is to use an allocation strategy that suffers from neither a nor b > (for > tuple and pergroup) and that has faster allocations too. That then > also > would have the consequence that we don't need to care about per-alloc > overhead anymore (be it a or b). It might make sense for the next release but I'm wary of more churn in nodeAgg.c at this stage. It's not a trivial change because the different allocations happen in different places and combining them would be tricky. > > So do you generally favor the EstimateMemoryChunkSpace() patch > > (that > > works for all context types)? Or do you have another suggestion? Or > > do > > you think we should just do nothing? > > I think I'm increasingly leaning towards either using a constant > overhead factor, or just getting rid of all memory context > overhead. There's clearly no obviously correct design for the "chunk > size" functions, and not having overhead is better than ~correctly > estimating it. Trying to actually eliminate the overhead sounds like v14 to me. I believe the formula for AllocSet overhead can be approximated with: 16 + size/4 That would probably be better than a constant but a little hacky. I can do that as a spot fix if this patch proves unpopular. Regards, Jeff Davis