Re: [HACKERS] Combining Aggregates

David Rowley Sun, 26 Jul 2015 20:00:07 -0700

On 27 July 2015 at 04:58, Heikki Linnakangas <[email protected]> wrote:


> On 04/01/2015 06:28 PM, Robert Haas wrote:
>
>> On Mon, Mar 30, 2015 at 1:28 AM, Michael Paquier
>> <[email protected]> wrote:
>>
>>> I've been thinking of bumping this patch to the June commitfest as the
>>>> patch only exists to provide the basic infrastructure for things like
>>>> parallel aggregation, aggregate before join, and perhaps auto updating
>>>> materialised views.
>>>>
>>>> It seems unlikely that any of those things will happen for 9.5.
>>>>
>>>
>>> Yeah, I guess so...
>>>
>>>  Does anybody object to me moving this to June's commitfest?
>>>>
>>>
>>> Not from my side FWIW. I think it actually makes sense.
>>>
>>
>> +1.  I'd like to devote some time to looking at this, but I don't have
>> the time right now.  The chances that we can do something useful with
>> it in 9.6 seem good.
>>
>
> And the June commitfest is now in progress.
>
> This patch seems sane to me, as far as it goes. However, there's no
> planner or executor code to use the aggregate combining for anything. I'm
> not a big fan of dead code, I'd really like to see something to use this.
>

Thanks for looking. I partially agree with that, it is a little weird to
put in code that's yet to be used. I'd certainly agree 95% if this was the
final commitfest of 9.5, but we're in the first commitfest of 9.6 and I
think there's a very high probability of this getting used in 9.6, and
likely that probability would be even higher if the code is already in.
Perhaps it's a little bit in the same situation as to Robert's parallel
worker stuff?



>
> The main use case people have been talking about is parallel query, but is
> there some other case this would be useful right now, without the parallel
> query feature? You and Simon talked about this case:
>
>  2. Queries such as:
>>
>> SELECT p.name, SUM(s.qty) FROM sales s INNER JOIN product p ON
>> s.product_id
>> = p.product_id GROUP BY p.name;
>>
>> Such a query could be transformed into:
>>
>> SELECT p.name,SUM(qty) FROM (SELECT product_id,SUM(qty) AS qty FROM sales
>> GROUP BY product_id) s
>> INNER JOIN product p ON p.product_id = s.product_id GROUP BY p_name;
>>
>> Of course the outer query's SUM and GROUP BY would not be required if
>> there
>> happened to be a UNIQUE index on product(name), but assuming there's not
>> then the above should produce the results faster. This of course works ok
>> for SUM(), but for something like AVG() or STDDEV() the combine/merge
>> aggregate functions would be required to process those intermediate
>> aggregate results that were produced by the sub-query.
>>
>
> Any chance you could implement that in the planner?
>
>
Yes! I'm actually working on it now and so far have it partially working.
Quite likely I'll be able to submit for CF2. There's still some costing
tweaks to do. So far it just works for GROUP BY with no aggs. I plan to fix
that later using this patch.

I don't want to talk too much about it on this thread, but in a test query
which is the one in my example, minus the SUM(qty), with 1 million sales
records, and 100 products, performance goes from 350ms to 200ms on my
machine, so looking good so far.

Regards

David Rowley
--
 David Rowley                   http://www.2ndQuadrant.com/
<http://www.2ndquadrant.com/>
 PostgreSQL Development, 24x7 Support, Training & Services

Re: [HACKERS] Combining Aggregates

Reply via email to