On Sun, Jun 28, 2020 at 05:40:16PM -0700, Peter Geoghegan wrote: > I think any problem that might come up with the costing is best > thought of as a distinct problem. This thread is mostly about the > problem of users getting fewer in-memory hash aggregates compared to a > previous release running the same application (though there has been > some discussion of the other problem, too [1], but it's thought to be > less serious). > > The problem is that affected users were theoretically never entitled > to the performance they came to rely on, and yet there is good reason > to think that hash aggregate really should be entitled to more memory. > They won't care that they were theoretically never entitled to that > performance, though -- they *liked* the fact that hash agg could > cheat. And they'll dislike the fact that this cannot be corrected by > tuning work_mem, since that affects all node types that consume > work_mem, not just hash aggregate -- that could cause OOMs for them. > > There are two or three similar ideas under discussion that might fix > the problem. They all seem to involve admitting that hash aggregate's > "cheating" might actually have been a good thing all along (even > though giving hash aggregate much much more memory than other nodes is > terrible), and giving hash aggregate license to "cheat openly". Note > that the problem isn't exactly a problem with the hash aggregate > spilling patch. You could think of the problem as a pre-existing issue > -- a failure to give more memory to hash aggregate, which really > should be entitled to more memory. Jeff's patch just made the issue > more obvious.
In thinking some more about this, I came out with two ideas. First, in pre-PG 13, we didn't choose hash_agg if we thought it would spill, but if we misestimated and it used more work_mem, we allowed it. The effect of this is that if we were close, but it went over, we allowed it just for hash_agg. Is this something we want to codify for all node types, i.e., choose a non-spill node type if we need a lot more than work_mem, but then let work_mem be a soft limit if we do choose it, e.g., allow 50% over work_mem in the executor for misestimation before spill? My point is, do we want to use a lower work_mem for planning and a higher one in the executor before spilling. My second thought is from an earlier report that spilling is very expensive, but smaller work_mem doesn't seem to hurt much. Would we achieve better overall performance by giving a few nodes a lot of memory (and not spill those), and other nodes very little, rather than having them all be the same size, and all spill? -- Bruce Momjian <br...@momjian.us> https://momjian.us EnterpriseDB https://enterprisedb.com The usefulness of a cup is in its emptiness, Bruce Lee