On Thu, 2018-06-21 at 13:44 -0700, David Gershuni wrote: > To handle hash collisions, we can do the following: > > 1) We track the current hash code we’re processing, in ascending > order. > > 2) Instead of combining one group at at time, we’ll maintain a list > of > all groups we’ve seen that match the current hash code. > > 3) When we peak at a spill file, if its next group’s hash code > matches > our current hash code, we’ll check to see if that group matches any > of the > groups in our list. If so, we’ll combine it with the matching group. > If not, > we’ll add this group to our list. > > 4) Once the head of each spill file exceeds the current hash code, > we’ll > emit our list as output, empty it, and advance to the next hash code. > > Does that seem reasonable?
It seems algorithmically reasonable but awfully complex. I don't think that's a good path to take. I still only see two viable approaches: (1) Spill tuples into hash partitions: works and is a reasonably simple extension of our existing code. This is basically what the last patch I posted does (posted before grouping sets, so needs to be updated). (2) Spill tuples into a sort: I like this approach because it can be done simply (potentially less code than #1), and could be further improved without adding a ton of complexity. It may even eliminate the need for the planner to choose between hashagg and sort. The problem here is data types that have a hash opclass but no btree opclass. I might try to sidestep this problem by saying that data types with no btree opclass will not obey work_mem. Additionally, there are two other ideas that we might want to do regardless of which approach we take: * support evicting transition states from the hash table, so that we can handle clustered input better * refactor grouping sets so that phases are separate executor nodes so that we can undo some of the complexity in nodeAgg.c Regards, Jeff Davis