On Thu, Jun 25, 2020 at 11:16:23AM -0700, Andres Freund wrote:
Hi,
On 2020-06-25 10:44:42 -0700, Jeff Davis wrote:
There are only two possible paths: HashAgg and Sort+Group, and we need
to pick one. If the planner expects one to spill, it is likely to
expect the other to spill. If one spills in the executor, then the
other is likely to spill, too. (I'm ignoring the case with a lot of
tuples and few groups because that doesn't seem relevant.)
There's also ordered index scan + Group. Which will often be vastly
better than Sort+Group, but still slower than HashAgg.
Imagine that there was only one path available to choose. Would you
suggest the same thing, that unexpected spills can exceed work_mem but
expected spills can't?
I'm not saying what I propose is perfect, but I've yet to hear a better
proposal. Given that there *are* different ways to implement
aggregation, and that we use expected costs to choose, I think the
assumed costs are relevant.
I share Jeff's opinion that this is quite counter-intuitive and we'll
have a hard time explaining it to users.
regards
--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services