On Mon, Jun 29, 2020 at 10:20:14AM -0700, Peter Geoghegan wrote:
On Mon, Jun 29, 2020 at 8:07 AM Tomas Vondra
<tomas.von...@2ndquadrant.com> wrote:
Not sure I follow. Which cases do you mean when you say that 12 could
safely do them, but 13 won't? I see the following two cases:

a) Planner in 12 and 13 disagree about whether the hash table will fit
into work_mem.

I don't quite see why this would be the case (given the same cardinality
estimates etc.), though. That is, if 12 says "will fit" I'd expect 13 to
end up with the same conclusion. But maybe 13 has higher per-tuple
overhead or something? I know we set aside some memory for BufFiles, but
not when we expect the whole hash table to fit into memory.

I have no reason to believe that the planner is any more or any less
likely to conclude that the hash table will fit in memory in v13 as
things stand (I don't know if the BufFile issue matters).

In general, grouping estimates probably aren't very good compared to
join estimates. I imagine that in either v12 or v13 the planner is
likely to incorrectly believe that it'll all fit in memory fairly
often. v12 was much too permissive about what could happen. But v13 is
too conservative.


Can you give and example of what you mean by being too permissive or too
conservative? Do you mean the possibility of unlimited memory usage in
v12, and strict enforcement in v13?

IMO enforcing the work_mem limit (in v13) is right in principle, but I
do understand the concerns about unexpected regressions compared to v12.

b) Planner believes the hash table will fit, due to underestimate.

On 12, we'd just let the hash table overflow, which may be a win when
there's enough RAM and the estimate is not "too wrong". But it may
easily end with a sad OOM.

It might end up with an OOM on v12 due to an underestimate -- but
probably not! The fact that a hash aggregate is faster than a group
aggregate ameliorates the higher memory usage. You might actually use
less memory this way.


I don't understand what you mean by "less memory" when the whole issue
is significantly exceeding work_mem?

I don't think the OOM is the only negative performance here - using a
lot of memory also forces eviction of data from page cache (although
writing a lot of temporary files may have similar effect).

On 13, we'll just start spilling. True - people tend to use conservative
work_mem values exactly because of cases like this (which is somewhat
futile as the underestimate may be arbitrarily wrong) and also because
they don't know how many work_mem instances the query will use.

So yeah, I understand why people may not want to increase work_mem too
much, and maybe hash_work would be a way to get the "no spill" behavior.

Andres wanted to increase the amount of memory that could be used at
execution time, without changing planning. You could say that hash_mem
is a more ambitious proposal than that. It's changing the behavior
across the board -- though in a way that makes sense anyway. It has
the additional benefit of making it more likely that an in-memory hash
aggregate will be used. That isn't a problem that we're obligated to
solve now, so this may seem odd. But if the more ambitious plan is
actually easier to implement and support, why not pursue it?

hash_mem seems a lot easier to explain and reason about than having
different work_mem budgets during planning and execution, which is
clearly a kludge. hash_mem makes sense generally, and more or less
solves the problems raised on this thread.


I agree with this, and I'm mostly OK with having hash_mem. In fact, from
the proposals in this thread I like it the most - as long as it's used
both during planning and execution. It's a pretty clear solution.

It's not a perfect solution in the sense that it does not reintroduce
the v12 behavior perfectly (i.e. we'll still spill after reaching
hash_mem) but that may be good enougn.


regards

--
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Reply via email to