Re: [HACKERS] Partition-wise aggregation/grouping

Jeevan Chalke Wed, 07 Mar 2018 06:39:06 -0800

On Wed, Mar 7, 2018 at 1:45 AM, Robert Haas <[email protected]> wrote:


> On Tue, Mar 6, 2018 at 5:31 AM, Jeevan Chalke
> <[email protected]> wrote:
> > This is in-lined with enable_hashagg GUC. Do you think
> > enable_partitionwise_aggregate seems better? But it will be not
> consistent
> > with other GUC names like enable_hashagg then.
>
> Well, if I had my way, enable_hashagg would be spelled
> enable_hash_aggregate, too, but I wasn't involved in the project that
> long ago.  100% consistency is hard to achieve here; the perfect
> parallel of enable_hashagg would be enable_partitionwiseagg, but then
> it would be inconsistent with enable_partitionwise_join unless we
> renamed it to enable_partitionwisejoin, which I would rather not do.
> I think the way the enable_blahfoo names were done was kinda
> shortsighted -- it works OK as long as blahfoo is pretty short, like
> mergejoin or hashagg or whatever, but if you have more or longer words
> then I think it's hard to see where the word boundaries are without
> any punctuation.  And if you start abbreviating then you end up with
> things like enable_pwagg which are not very easy to understand.  So I
> favor spelling everything out and punctuating it.
>

Understood and make sense.
Updated.


>
> > So the code for doing partially aggregated partial paths and partially
> > aggregated non-partial path is same except partial paths goes into
> > partial_pathlist where as non-partial goes into pathlist of
> > partially_grouped_rel. Thus, calling add_paths_to_partial_grouping_rel()
> > when isPartialAgg = true seems correct. Also as we have decided, this
> > function is responsible to create all partially aggregated paths
> including
> > both partial and non-partial.
> >
> > Am I missing something?
>
> Hmm.  I guess not.  I think I didn't read this code well enough
> previously.  Please find attached proposed incremental patches (0001
> and 0002) which hopefully make the code in this area a bit clearer.
>

Yep. Thanks for these patches.
I have merged these changes into my main (0007) patch now.


>
> >> +       /*
> >> +        * If there are any fully aggregated partial paths present,
> >> may be because
> >> +        * of parallel Append over partitionwise aggregates, we must
> stick
> >> a
> >> +        * Gather or Gather Merge path atop the cheapest partial path.
> >> +        */
> >> +       if (grouped_rel->partial_pathlist)
> >>
> >> This comment is copied from someplace where the code does what the
> >> comment says, but here it doesn't do any such thing.
> >
> > Well, these comments are not present anywhere else than this place. With
> > Parallel Append and Partitionwise aggregates, it is now possible to have
> > fully aggregated partial paths now. And thus we need to stick a Gather
> > and/or Gather Merge atop cheapest partial path. And I believe the code
> does
> > the same. Am I missing something?
>
> I misread the code.  Sigh.  I should have waited until today to send
> that email and taken time to study it more carefully.  But I still
> don't think it's completely correct.  It will not consider using a
> pre-sorted path; the only strategies it can consider are cheapest path
> + Gather and cheapest path + explicit Sort (even if the cheapest path
> is already correctly sorted!) + Gather Merge.  It should really do
> something similar to what add_paths_to_partial_grouping_rel() already
> does: first call generate_gather_paths() and then, if the cheapest
> partial path is not already correctly sorted, also try an explicit
> Sort + Gather Merge.  In fact, it looks like we can actually reuse
> that logic exactly.  See attached 0003 incremental patch; this changes
> the outputs of one of your regression tests, but the new plan looks
> better.
>

This seems like a refactoring patch and thus added as separate patch (0005)
in patch-set.
Changes related to PWA patch are merged accordingly too.

Attached new patch-set with these changes merged and fixing review comments
from Ashutosh Bapat along with his GroupPathExtraData changes patch.


> Some other notes:
>
> There's a difference between performing partial aggregation in the
> same process and performing it in a different process.  hasNonPartial
> tells us that we can't perform partial aggregation *at all*;
> hasNonSerial only tells us that partial and final aggregation must
> happen in the same process.  This patch could possibly take advantage
> of partial aggregation even when hasNonSerial is set.  Finalize
> Aggregate -> Append -> N copies of { Partial Aggregate -> Whatever }
> is OK with hasNonSerial = true as long as hasNonPartial = false.  Now,
> the bad news is that for this to actually work we'd need to define new
> values of AggSplit, like AGGSPLIT_INITIAL = AGGSPLITOP_SKIPFINAL and
> AGGSPLIT_FINAL = AGGSPLITOP_COMBINE, and I'm not sure how much
> complexity that adds.  However, if we're not going to do that, I think
> we'd better at last add some comments about it suggesting that someone
> might want to do something about it in the future.
>

Am I not familier with these much. So will add a comment as you said.


>
> I think that, in general, it's a good idea to keep the number of times
> that create_grouping_paths() does something which is conditional on
> whether child_data is NULL to a minimum.  I haven't looked at what
> Ashutosh tried to do there so I don't know whether it's good or bad,
> but I like the idea, if we can do it cleanly.
>
> It strikes me that we might want to consider refactoring things so
> that create_grouping_paths() takes the grouping_rel and
> partial_grouping_rel as input arguments.  Right now, the
> initialization of the child grouping and partial-grouping rels is
> partly in try_partitionwise_aggregate(), which considers marking one
> of them (but never both?) as dummy rels and create_grouping_paths()
> which sets reloptkind, serverid, userid, etc.  The logic of all of
> this is a little unclear to me.  Presumably, if the input rel is
> dummy, then both the grouping_rel and the partial_grouping_rel are
> also dummy.  Also, presumably we should set the reloptkind correctly
> as soon as we create the rel, not at some later stage.
>
> Or maybe what we should do is split create_grouping_paths() into two
> functions.  Like this:
>
>     if (child_data)
>     {
>         partial_grouping_target = child_data->partialRelTarget;
>         partially_grouped_rel->reltarget = partial_grouping_target;
>         agg_partial_costs = child_data->agg_partial_costs;
>         agg_final_costs = child_data->agg_final_costs;
>     }
>
> --- SPLIT IT HERE ---
>
>     /* Apply partitionwise aggregation technique, if possible. */
>     try_partitionwise_grouping(root, input_rel, grouped_rel,
>                                partially_grouped_rel, target,
>                                partial_grouping_target, agg_costs,
>                                agg_partial_costs, agg_final_costs, gd,
>                                can_sort, can_hash, havingQual,
> isPartialAgg);
>
> It seems to me that everything from that point to the end is doing the
> path generation and it's all pretty much the same for the parent and
> child cases.  But everything before that is either stuff that doesn't
> apply to the child case at all (like the degenerate grouping case) or
> stuff that should be done once and passed down (like
> can_sort/can_hash).  The only exception I see is some of the stuff
> that sets up the upper rel at the top of the function, but maybe that
> logic could be refactored into a separate function as well (like
> initialize_grouping_rel).  Then, instead of try_partitionwise_join()
> actually calling create_grouping_paths(), it would call
> initialize_grouping_rel() and then the path-adding function that we
> split off from the bottom of the current create_grouping_paths(),
> basically skipping all that stuff in the middle that we don't really
> want to do in that case.
>

I will have a look over this proposal.


>
> --
> Robert Haas
> EnterpriseDB: http://www.enterprisedb.com
> The Enterprise PostgreSQL Company
>



-- 
Jeevan Chalke
Technical Architect, Product Development
EnterpriseDB Corporation
The Enterprise PostgreSQL Company

partition-wise-agg-v16.tar.gz
Description: application/gzip

Re: [HACKERS] Partition-wise aggregation/grouping

Reply via email to