Re: [HACKERS] Parallel Aggregate

Tomas Vondra Sun, 20 Mar 2016 13:48:11 -0700

Hi,

On 03/20/2016 09:58 AM, David Rowley wrote:

On 20 March 2016 at 03:19, Robert Haas <robertmh...@gmail.com> wrote:

On Fri, Mar 18, 2016 at 10:32 PM, David Rowley
<david.row...@2ndquadrant.com> wrote:

Updated patch is attached.


I think this looks structurally correct now, and I think it's doing
the right thing as far as parallelism is concerned.  I don't see any
obvious problems in the rest of it, either, but I haven't thought
hugely deeply about the way you are doing the costing, nor have I
totally convinced myself that all of the PathTarget and setrefs stuff
is correct.  But I think it's probably pretty close.  I'll study it
some more next week.


Thank you for the reviews. The only thing I can think to mention which
I've not already is that I designed estimate_hashagg_tablesize() to be
reusable in various places in planner.c, yet I've only made use of it
in create_grouping_paths(). I would imagine that it might be nice to
also modify the other places which perform a similar calculation to
use that function, but I don't think that needs to be done for this
patch... perhaps a follow-on cleanup.

Hmmm, so how many places that could use the new function are there? I'veonly found these two:


   create_distinct_paths (planner.c)
   choose_hashed_setop (prepunion.c)

That doesn't seem like an extremely high number, so perhaps doing it inthis patch would be fine. However if the function is defined as static,choose_hashed_setop won't be able to use it anyway (well, it'll have tomove the prototype into a header and remove the static).

I wonder why we would need to allow cost_agg=NULL, though? I mean, ifthere are no costing information, wouldn't it be better to either raisean error, or at the very least do something like:


    } else
        hashentrysize += hash_agg_entry_size(0);

which is what create_distinct_paths does?

I'm not sure changing the meaning of enable_hashagg like this is a goodidea. It worked as a hard switch before, while with this change thatwould not be the case. Or more accurately - it would not be the case foraggregates, but it would still work the old way for other types ofplans. Not sure that's a particularly good idea.

What about introducing a GUC to enable parallel aggregate, while stillallowing other types of parallel nodes (I guess that would be handy forother types of parallel nodes - it's a bit blunt tool, but tweakingmax_parallel_degree is even blumter)? e.g. enable_parallelagg?

I do also have a question about parallel aggregate vs. work_mem.Nowadays we mostly say to users a query may allocate a multiple ofwork_mem, up to one per node in the plan. Apparently with parallelaggregate we'll have to say "multiplied by number of workers", becauseeach aggregate worker may allocate up to hashaggtablesize. Is thatreasonable? Shouldn't we restrict the total size of hash tables in allworkers somehow?


create_grouping_paths also contains this comment:

 /*
  * Generate HashAgg Path providing the estimated hash table size is not
  * too big, although if no other Paths were generated above, then we'll
  * begrudgingly generate one so that we actually have a Path to work
  * with.
  */

I'm not sure this is a particularly clear comment, I think the old onewas way much informative despite being a single line:


/* Consider reasons to disable hashing, but only if we can sort instead */

BTW create_grouping_paths probably grew to a size when splitting it intosmaller pieces would be helpful?


regards

--
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Parallel Aggregate

Reply via email to