Re: [HACKERS] Parallel Aggregate

David Rowley Mon, 07 Mar 2016 14:17:17 -0800

On 5 March 2016 at 07:25, Robert Haas <[email protected]> wrote:
> On Thu, Mar 3, 2016 at 11:00 PM, David Rowley
>> 3. The code never attempts to mix and match Grouping Agg and Hash Agg
>> plans. e.g it could be an idea to perform Partial Hash Aggregate ->
>> Gather -> Sort -> Finalize Group Aggregate, or hash as in the Finalize
>> stage. I just thought doing this is more complex than what's really
>> needed, but if someone can think of a case where this would be a great
>> win then I'll listen, but you have to remember we don't have any
>> pre-sorted partial paths at this stage, so an explicit sort is
>> required *always*. This might change if someone invented partial btree
>> index scans... but until then...
>
> Actually, Rahila Syed is working on that.  But it's not done yet, so
> presumably will not go into 9.6.
>
> I don't really see the logic of this, though.  Currently, Gather
> destroys the input ordering, so it seems preferable for the
> finalize-aggregates stage to use a hash aggregate whenever possible,
> whatever the partial-aggregate stage did.  Otherwise, we need an
> explicit sort.  Anyway, it seems like the two stages should be costed
> and decided on their own merits - there's no reason to chain the two
> decisions together.


Thanks for looking at this.
I've attached an updated patch which re-bases the whole patch on top
of the upper planner changes which have just been committed.
In this version create_grouping_paths() does now consider mixed
strategies of hashed and sorted, although I have a few concerns with
the code that I've written. I'm solely posting this early to minimise
any duplicate work.

My concerns are:
1. Since there's no cheapest_partial_path in RelOptInfo the code is
currently considering every partial_path for parallel hash aggregate.
With normal aggregation we only ever use the cheapest path, so this
may not be future proof. As of today we do only have at most one
partial path in the list, but there's no reason to code this with that
assumption. I didn't put in much effort to improve this as I see code
in generate_gather_paths() which also makes assumptions about there
just being 1 partial path. Perhaps we should expand RelOptInfo to
track the cheapest partial path? or maybe allpaths.c should have a
function to fetch the cheapest out of the list?

2. In mixed parallel aggregate mode, when the query has no aggregate
functions, the code currently will use a nodeAgg for AGG_SORTED
strategy rather than a nodeGroup, as it would in serial agg mode. This
probably needs to be changed.

3. Nothing in create_grouping_paths() looks at the force_parallel_mode
GUC. I had a quick look at this GUC and was a bit surprised to see 3
possible states, but no explanation of what they do, so I've not added
code which pays attention to this setting yet. I'd imagine this is
just a matter of skipping serial path generation when parallel is
possible when force_parallel_mode is FORCE_PARALLEL_ON. I've no idea
what FORCE_PARALLEL_REGRESS is for yet.

The setrefs.c parts of the patch remain completely broken. I've not
had time to look at this again yet, sorry.

-- 
 David Rowley                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

parallel_aggregation_cc75f61_2016-03-08.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Parallel Aggregate

Reply via email to