On Mon, 2018-06-11 at 11:55 -0400, Robert Haas wrote: > performance degradation is not necessarily much better than OOM. I > suspect that part of the reason why both Andres and David proposed > algorithms that combined hashing and sorting is out of a desire to > cater somewhat to both few-tuples-per-group and many-tuples-per- > group. >
I think average group size is the wrong way to look at it, because averages are only useful for a normal distribution. The two most interesting cases are: 1. Fairly uniform group size (e.g. normal dist) 2. Skew values, power law distribution, etc., where some groups are very large and most are tiny by comparison. I am calling the very large groups "skewed groups". For #1, hashing and sorting are both reasonable, and it depends on a lot of factors (number of groups, clustering, available memory). For #2, hashing is really good and sorting is really bad. That's because (at present) we sort all of the tuples before doing any aggregation, so it expends a lot of effort on the skewed groups. Hashing can keep skewed groups in memory and avoid spilling a large fraction of the input tuples at all. If we get the skewed groups into the hash table, and avoid OOM, I think that is a success. My patch did that, except it didn't account for two cases: (a) clustered input, where skewed groups may not appear until the hash table is already full; and (b) variable-sized transition values that grow with the group size > One big advantage to just partitioning the input tuples by hash code > and then proceeding batch by batch is that it works for any aggregate > that can support hash aggregation in the first place. You do not Agreed, but I think we should evaluate Andres's idea of feeding spilled tuples to a Sort, because the overall complexity might be lower even accounting for the special cases you're worried about. Also, I am not sure we should really be designing around data types where it makes sense to group and then don't supply a btree opclass. Seems like they are likely to hit a problem soon anyway. Regards, Jeff Davis