On Mon, Jan 16, 2017 at 10:59 AM, Finnerty, Jim <jfinn...@amazon.com> wrote:
> The ability to exploit hashed aggregation within sorted groups, when the 
> order of the input stream can be exploited this way, is potentially a useful 
> way to improve aggregation performance more generally.  This would 
> potentially be beneficial when the input size is expected to be larger than 
> the amount of working memory available for hashed aggregation, but where 
> there is enough memory to hash-aggregate just the unsorted grouping key 
> combinations, and when the cumulative cost of rebuilding the hash table for 
> each sorted subgroup is less than the cost of sorting the entire input.  In 
> other words, if most of the grouping key combinations are already segregated 
> by virtue of the input order, then hashing the remaining combinations within 
> each sorted group might be done in memory, at the cost of rebuilding the hash 
> table for each sorted subgroup.

Neat idea.

> I haven’t looked at the code for this change yet (I hope I will have the time 
> to do that).  Ideally the decision to choose the aggregation method as 
> sorted, hashed, or mixed hash/sort should be integrated into the cost model, 
> but given the notorious difficulty of estimating intermediate cardinalities 
> accurately it would be difficult to develop a cardinality model and a cost 
> model accurate enough to choose among these options consistently well.

Yes, that might be a little tricky.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Reply via email to