On Wed, 2020-03-25 at 15:09 -0700, Andres Freund wrote:
> > The overhead comes from two places: (a) the chunk header; and (b)
> > the
> > round-up-to-nearest-power-of-two behavior.
> > 
> > Combining the group tuple and per-group states only saves the
> > overhead
> > from (a); it does nothing to help (b), which is often bigger.
> Hm? It very well can help with b), since the round-up only happens
> once
> now? I guess you could argue that it's possible that afterwards we'd
> more likely to end in a bigger size class, and thus have roughly the
> same amount of waste due rounding? But I don't think that's all that
> convincing.

Why is that not convincing? Each size class is double the previous one,
so piling double the memory into a single allocation doesn't help at
all. Two palloc(20)s turn into two 32-byte chunks; one palloc(40) turns
into a 64-byte chunk.

You might get lucky and the second chunk will fit in the wasted space
from the first chunk; but when it does cross a boundary, it will be a
bigger boundary and wipe out any efficiencies that you gained

Of course it depends on the exact distribution. But I don't see any
reason why we'd expect a distribution that would be favorable to
combining chunks together (except to avoid the chunk header, problem

> I still, as I mentioned on the call, suspect that the right thing
> here
> is to use an allocation strategy that suffers from neither a nor b
> (for
> tuple and pergroup) and that has faster allocations too. That then
> also
> would have the consequence that we don't need to care about per-alloc
> overhead anymore (be it a or b).

It might make sense for the next release but I'm wary of more churn in
nodeAgg.c at this stage. It's not a trivial change because the
different allocations happen in different places and combining them
would be tricky.

> > So do you generally favor the EstimateMemoryChunkSpace() patch
> > (that
> > works for all context types)? Or do you have another suggestion? Or
> > do
> > you think we should just do nothing?
> I think I'm increasingly leaning towards either using a constant
> overhead factor, or just getting rid of all memory context
> overhead. There's clearly no obviously correct design for the "chunk
> size" functions, and not having overhead is better than ~correctly
> estimating it.

Trying to actually eliminate the overhead sounds like v14 to me.

I believe the formula for AllocSet overhead can be approximated with:
   16 + size/4

That would probably be better than a constant but a little hacky. I can
do that as a spot fix if this patch proves unpopular.

        Jeff Davis

Reply via email to