Re: [HACKERS] Parallel Aggregate

David Rowley Sun, 20 Mar 2016 16:31:08 -0700

On 21 March 2016 at 09:47, Tomas Vondra <tomas.von...@2ndquadrant.com> wrote:
> On 03/20/2016 09:58 AM, David Rowley wrote:
>> Thank you for the reviews. The only thing I can think to mention which
>> I've not already is that I designed estimate_hashagg_tablesize() to be
>> reusable in various places in planner.c, yet I've only made use of it
>> in create_grouping_paths(). I would imagine that it might be nice to
>> also modify the other places which perform a similar calculation to
>> use that function, but I don't think that needs to be done for this
>> patch... perhaps a follow-on cleanup.
>
> Hmmm, so how many places that could use the new function are there? I've
> only found these two:
>
>    create_distinct_paths (planner.c)
>    choose_hashed_setop (prepunion.c)
>
> That doesn't seem like an extremely high number, so perhaps doing it in this
> patch would be fine. However if the function is defined as static,
> choose_hashed_setop won't be able to use it anyway (well, it'll have to move
> the prototype into a header and remove the static).
>
> I wonder why we would need to allow cost_agg=NULL, though? I mean, if there
> are no costing information, wouldn't it be better to either raise an error,
> or at the very least do something like:
>
>     } else
>         hashentrysize += hash_agg_entry_size(0);
>
> which is what create_distinct_paths does?


Yes, it should do that... My mistake. I've ended up just removing the
NULL check as I don't want to touch create_distinct_paths() in this
patch. I'd rather leave that as a small cleanup patch for later,
although the code in create_distinct_paths() is more simple without
the aggregate sizes being calculated, so there's perhaps less of a
need to use the helper function there. If that cleanup patch
materialises then the else hashentrysize += hash_agg_entry_size(0);
can be added with that patch.

> I'm not sure changing the meaning of enable_hashagg like this is a good
> idea. It worked as a hard switch before, while with this change that would
> not be the case. Or more accurately - it would not be the case for
> aggregates, but it would still work the old way for other types of plans.
> Not sure that's a particularly good idea.

Hmm, I don't see how it was a hard switch before. If we were unable to
sort by the group by clause then hashagg would magically be enabled.
The reason I did this was to simplify the logic in
create_grouping_paths(). What difference do you imagine that there
actually is here?

The only thing I can think of is; we now generate a hashagg path where
we previously didn't. This has disable_cost added to the startup_cost
so is quite unlikely to win. Perhaps there is some differences if
someone did SET enable_sort = false; SET enable_hashagg = false; I'm
not sure if we should be worried there though. Also maybe there's
going to be a difference if the plan costings were so high that
disable_cost was drowned out by the other costings.

Apart from that, It would actually be nice to be consistent with this
enable_* GUCs, as to my knowledge the others all just add disable_cost
to the startup_cost of the path.

> What about introducing a GUC to enable parallel aggregate, while still
> allowing other types of parallel nodes (I guess that would be handy for
> other types of parallel nodes - it's a bit blunt tool, but tweaking
> max_parallel_degree is even blumter)? e.g. enable_parallelagg?

Haribabu this in his version of the patch and I didn't really
understand the need for it, I assumed it was for testing only. We
don't have enable_parallelseqscan, and would we plan on adding GUCs
each time we enable a node for parallelism? I really don't think so,
we already have parallel hash join and nested loop join without GUCs
to disable them. I see no reason to add them there, and I also don't
here.

> I do also have a question about parallel aggregate vs. work_mem. Nowadays we
> mostly say to users a query may allocate a multiple of work_mem, up to one
> per node in the plan. Apparently with parallel aggregate we'll have to say
> "multiplied by number of workers", because each aggregate worker may
> allocate up to hashaggtablesize. Is that reasonable? Shouldn't we restrict
> the total size of hash tables in all workers somehow?

I did think about this, but thought, either;

1) that a project wide decision should be made on how to handle this,
not just for parallel aggregate, but parallel hash join too, which as
I understand it, for now it builds an identical hashtable per worker.
2) the limit is per node, per connection, and parallel aggregates have
multiple connections, so we might not be breaking our definition of
how to define work_mem, since we're still limited by max_connections
anyway.

> create_grouping_paths also contains this comment:
>
>  /*
>   * Generate HashAgg Path providing the estimated hash table size is not
>   * too big, although if no other Paths were generated above, then we'll
>   * begrudgingly generate one so that we actually have a Path to work
>   * with.
>   */
>
> I'm not sure this is a particularly clear comment, I think the old one was
> way much informative despite being a single line:
>
> /* Consider reasons to disable hashing, but only if we can sort instead */

hmm, I find it quite clear, but perhaps that's because I wrote the
code. I'm not really sure what you're not finding clear about it to be
honest. Tom's original comment was quite generic to allow for more
reasons, but I removed one of those reasons by simplifying the logic
around enable_hashagg, so I didn't think Tom's comment suited well
anymore.

I've rewritten the comment to become:

/*
* Providing that the estimated size of the hashtable does not exceed
* work_mem, we'll generate a HashAgg Path, although if we were unable
* to sort above, then we'd better generate a Path, so that we at least
* have one.
*/

How about that?

> BTW create_grouping_paths probably grew to a size when splitting it into
> smaller pieces would be helpful?

I'd rather not. Amit mentioned this too [1]. See 4A. Robert has marked
it as ready for committer, so I really don't want to start hacking it
up too much at this stage unless Robert requests so.

An updated patch is attached. This hopefully addresses your concerns
with the comment, and also the estimate_hashagg_tablesize() NULL
checking.

[1] 
http://www.postgresql.org/message-id/CAKJS1f80=f-z1CUU7=QDmn0r=_yeu7pan2dz6rqsnupfefo...@mail.gmail.com

-- 
 David Rowley                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

0001-Allow-aggregation-to-happen-in-parallel_2016-03-21.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Parallel Aggregate

Reply via email to