Re: Default setting for enable_hashagg_disk

Peter Geoghegan Sun, 28 Jun 2020 17:41:41 -0700

On Sat, Jun 27, 2020 at 3:00 AM Amit Kapila <amit.kapil...@gmail.com> wrote:
> I think the advantage of delaying it is that we
> might see some real problems (like where hash aggregate is not a good
> choice) which can be fixed via the costing model.


I think any problem that might come up with the costing is best
thought of as a distinct problem. This thread is mostly about the
problem of users getting fewer in-memory hash aggregates compared to a
previous release running the same application (though there has been
some discussion of the other problem, too [1], but it's thought to be
less serious).

The problem is that affected users were theoretically never entitled
to the performance they came to rely on, and yet there is good reason
to think that hash aggregate really should be entitled to more memory.
They won't care that they were theoretically never entitled to that
performance, though -- they *liked* the fact that hash agg could
cheat. And they'll dislike the fact that this cannot be corrected by
tuning work_mem, since that affects all node types that consume
work_mem, not just hash aggregate -- that could cause OOMs for them.

There are two or three similar ideas under discussion that might fix
the problem. They all seem to involve admitting that hash aggregate's
"cheating" might actually have been a good thing all along (even
though giving hash aggregate much much more memory than other nodes is
terrible), and giving hash aggregate license to "cheat openly". Note
that the problem isn't exactly a problem with the hash aggregate
spilling patch. You could think of the problem as a pre-existing issue
-- a failure to give more memory to hash aggregate, which really
should be entitled to more memory. Jeff's patch just made the issue
more obvious.

[1] https://postgr.es/m/20200624191433.5gnqgrxfmucex...@alap3.anarazel.de
-- 
Peter Geoghegan

Re: Default setting for enable_hashagg_disk

Reply via email to