On Thu, Jul 11, 2019 at 06:06:33PM -0700, Jeff Davis wrote:
On Thu, 2019-07-11 at 17:55 +0200, Tomas Vondra wrote:
Makes sense. I haven't thought about how the hybrid approach would be
implemented very much, so I can't quite judge how complicated would
it be
to extend "approach 1" later. But if you think it's a sensible first
step,
I trust you. And I certainly agree we need something to compare the
other
approaches against.
Is this a duplicate of your previous email?
Yes. I don't know how I managed to send it again. Sorry.
I'm slightly confused but I will use the opportunity to put out another
WIP patch. The patch could use a few rounds of cleanup and quality
work, but the funcionality is there and the performance seems
reasonable.
I rebased on master and fixed a few bugs, and most importantly, added
tests.
It seems to be working with grouping sets fine. It will take a little
longer to get good performance numbers, but even for group size of one,
I'm seeing HashAgg get close to Sort+Group in some cases.
Nice! That's a very nice progress!
You are right that the missed lookups appear to be costly, at least
when the data all fits in system memory. I think it's the cache misses,
because sometimes reducing work_mem improves performance. I'll try
tuning the number of buckets for the hash table and see if that helps.
If not, then the performance still seems pretty good to me.
Of course, HashAgg can beat sort for larger group sizes, but I'll try
to gather some more data on the cross-over point.
Yes, makes sense. I think it's acceptable as long as we consider this
during costing (when we know in advance we'll need this) or treat it to be
emergency measure.
regards
--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services