On Mon, Jan 16, 2017 at 10:59 AM, Finnerty, Jim <jfinn...@amazon.com> wrote: > The ability to exploit hashed aggregation within sorted groups, when the > order of the input stream can be exploited this way, is potentially a useful > way to improve aggregation performance more generally. This would > potentially be beneficial when the input size is expected to be larger than > the amount of working memory available for hashed aggregation, but where > there is enough memory to hash-aggregate just the unsorted grouping key > combinations, and when the cumulative cost of rebuilding the hash table for > each sorted subgroup is less than the cost of sorting the entire input. In > other words, if most of the grouping key combinations are already segregated > by virtue of the input order, then hashing the remaining combinations within > each sorted group might be done in memory, at the cost of rebuilding the hash > table for each sorted subgroup.
Neat idea. > I haven’t looked at the code for this change yet (I hope I will have the time > to do that). Ideally the decision to choose the aggregation method as > sorted, hashed, or mixed hash/sort should be integrated into the cost model, > but given the notorious difficulty of estimating intermediate cardinalities > accurately it would be difficult to develop a cardinality model and a cost > model accurate enough to choose among these options consistently well. Yes, that might be a little tricky. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers