On Thu, Mar 19, 2020 at 12:25:16PM -0700, Jeff Davis wrote:
On Thu, 2020-03-19 at 19:11 +0100, Tomas Vondra wrote:
AFAICS the 2x allocation is the worst case, because it only happens
right after allocating a new block (of twice the size), when the
"utilization" drops from 100% to 50%. But in practice the utilization
will be somewhere in between, with an average of 75%.

Sort of. Hash Agg is constantly watching the memory, so it will
typically spill right at the point where the accounting for that memory
context is off by 2X.

That's mitigated because the hash table itself (the array of
TupleHashEntryData) ends up allocated as its own block, so does not
have any waste. The total (table mem + out of line) might be close to
right if the hash table array itself is a large fraction of the data,
but I don't think that's what we want.

 And we're not
doubling the block size indefinitely - there's an upper limit, so
over
time the utilization drops less and less. So as the contexts grow,
the
discrepancy disappears. And I'd argue the smaller the context, the
less
of an issue the overcommit behavior is.

The problem is that the default work_mem is 4MB, and the doubling
behavior goes to 8MB, so it's a problem with default settings.


Yes, it's an issue for the accuracy of our accounting. What Robert was
talking about is overcommit behavior at the OS level, which I'm arguing
is unlikely to be an issue, because for low work_mem values the absolute
difference is small, and on large work_mem values it's limited by the
block size limit.


regards

--
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Reply via email to