On Fri, Jun 26, 2020 at 05:24:36PM -0700, Peter Geoghegan wrote:
On Fri, Jun 26, 2020 at 4:59 PM Tomas Vondra
<tomas.von...@2ndquadrant.com> wrote:
I agree larger work_mem for hashagg (and thus less spilling) may mean
lower work_mem for so some other nodes that are less sensitive to this.
But I think this needs to be formulated as a cost-based decision,
although I don't know how to do that for the reasons I explained before
(bottom-up plan construction vs. distributing the memory budget).

Why do you think that it needs to be formulated as a cost-based
decision? That's probably true of a scheme that allocates memory very
intelligently, but what about an approach that's slightly better than
work_mem?


Well, there are multiple ideas discussed in this (sub)thread, one of
them being a per-query memory limit. That requires decisions how much
memory should different nodes get, which I think would need to be
cost-based.

What problems do you foresee (if any) with adding a hash_mem GUC that
gets used for both planning and execution for hash aggregate and hash
join nodes, in about the same way as work_mem is now?


Of course, a simpler scheme like this would not require that. And maybe
introducing hash_mem is a good idea - I'm not particularly opposed to
that, actually. But I think we should not introduce separate memory
limits for each node type, which was also mentioned earlier.

The problem of course is that hash_mem does not really solve the issue
discussed at the beginning of this thread, i.e. regressions due to
underestimates and unexpected spilling at execution time.

The thread is getting a rather confusing mix of proposals how to fix
that for v13 and proposals how to improve our configuration of memory
limits :-(

FWIW some databases already do something like this - SQL Server has
something called "memory grant" which I think mostly does what you
described here.

Same is true of Oracle. But Oracle also has simple work_mem-like
settings for sorting and hashing. People don't really use them anymore,
but apparently it was once common for the DBA to explicitly give over
more memory to hashing -- much like the hash_mem setting I asked about.
IIRC the same is true of DB2.


Interesting. What is not entirely clear to me how do these databases
decide how much should each node get during planning. With the separate
work_mem-like settings it's fairly obvious, but how do they do that with
the global limit (either per-instance or per-query)?

The difference between sort and hashagg spills is that for sorts
there is no behavior change. Plans that did (not) spill in v12 will
behave the same way on v13, modulo some random perturbation. For
hashagg that's not the case - some queries that did not spill before
will spill now.

So even if the hashagg spills are roughly equal to sort spills, both
are significantly more expensive than not spilling.

Just to make sure we're on the same page: both are significantly more
expensive than a hash aggregate not spilling *specifically*. OTOH, a
group aggregate may not be much slower when it spills compared to an
in-memory sort group aggregate. It may even be noticeably faster, due
to caching effects, as you mentioned at one point upthread.

This is the property that makes hash aggregate special, and justifies
giving it more memory than other nodes on a system-wide basis (the same
thing applies to hash join). This could even work as a multiplier of
work_mem.


Yes, I agree.


regards

--
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Reply via email to