On Wed, Jun 6, 2018 at 8:16 PM, Tomas Vondra <tomas.von...@2ndquadrant.com> wrote: > ... and this is pretty much what Jeff Davis suggested, I think. The > trouble is, each of those cases behaves nicely/terribly in different > corner cases.
That's a really good point. If the number of groups is pretty small compared to the number of input tuples, then you really only ever want to dump out transition values. By doing so, you minimize the amount of data you have to write. But if the number of groups is, say, half the number of input tuples, then computing transition values before you have all the values that belong to that group is probably a waste of time. I wonder if we could decide what to do by comparing the number of input tuples consumed to the number of groups created at the time we run out of memory. If we've got less than, I dunno, five tuples per group, then assume most groups are small. Pick a strategy that (mainly) spools input tuples. Otherwise, pick a strategy that spools transition tuples. In either case, it seems like we can pick a pure hashing strategy or switch to using both hashing and sorting. For example, IIUC, Andres's proposal involves spooling mostly input tuples, but can also spool transition tuples if the transition values consume more memory as they absorb more tuples. However, you could do a similar kind of thing without needing sort support. When you see a value that's not doesn't fit in your in-memory hash table, use the hash code to route it to 1 of N batch files. Have a second set of batch files for transition tuples in case you need to kick things out of the in-memory hash table. Once you finish reading the input, emit all the values that remain in the in-memory hash table and then process each batch file separately. Similarly, David's strategy involves spooling only transition tuples and then sorting on the group key, but it's again possible to do something similar without relying on sorting. Instead of flushing the in-memory hash table to a tuple store, split the transition tuples it contains among N batch files based on the hash code. Once you've read all of the input, go back and reprocess each batch file, combining transition values whenever the same group keys appear in more than one transition tuple. To me, the pure-hashing strategies look a little simpler, but maybe there's enough performance benefit from combining hashing and sorting that it's worth the complexity, or maybe we should just accept whichever variant somebody's willing to code. But I think we almost have to have separate handling for many-row-per-group and few-rows-per-group, because those seem fundamentally different. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company