Re: [HACKERS] The Future of Aggregation

Tomas Vondra Tue, 09 Jun 2015 09:22:24 -0700


On 06/09/15 17:27, Andres Freund wrote:

On 2015-06-09 17:19:33 +0200, Tomas Vondra wrote:

... and yet another use case for 'aggregate state combine' that I
just remembered about is grouping sets. What GROUPING SET (ROLLUP,
...) do currently is repeatedly sorting the input, once for each
grouping.


Actually, that's not really what happens. All aggregates that share
a sort order are computed in parallel. Only when sets do not share
an order additional sorts are required.


Oh, right, that's what I meant, but failed to explain clearly.

What could happen in some cases is building the most detailed
aggregationfirst, then repeatedly combine these partial states.


I'm not sure that'll routinely be beneficial, because it'd require
keeping track of all the individual "most detailed" results, no?

Yes, it requires tracking all the "detailed" aggregate states. I'm notclaiming this is beneficial in every case - sometimes the current sortapproach will be better, sometimes the combine approach will be faster.In a sense it's similar to GroupAggregate vs. HashAggregate.

I expect this 'combine' approach will be much faster is cases with manysource rows, but number of groups so small the detailed states fit intowork_mem. In that case you can do hashagg and then walk through the hashtable to build the actual results. This entirely eliminates theexpensive sorts, which is killing us in many DWH workloads (becausereal-world queries usually produce only few rows, even on very largedata sets).

But ISTM this might help even in cases when the detailed states don'tfit into memory, still assuming the number of groups is much smallerthan the number of source rows. Just do "partial aggregation" byaggregating the source rows (using hashagg) until you fill work_mem.Then either dump all the aggregate states to disk or only some of them(least frequently used?) and continue. Then you can sort the states, andassuming it's much smaller amount of data, it'll be much faster thansorting all the rows. And you can do the grouping sets using multiplesorts, just like today.

Of course, this only works if the partial aggregation actually reducesthe amount of data spilled to disk - if the aggregate states grow fast,or if you get the tuples in certain order, it may get ugly.


--
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


--
Sent via pgsql-hackers mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] The Future of Aggregation

Reply via email to