Attached patchset with all the review comments reported so far. On Fri, Dec 1, 2017 at 6:11 PM, Ashutosh Bapat < ashutosh.ba...@enterprisedb.com> wrote:
> Continuing with review of 0007. > > + > + /* Copy input rels's relids to grouped rel */ > + grouped_rel->relids = input_rel->relids; > > Isn't this done in fetch_upper_rel()? Why do we need it here? > There's also a similar hunk in create_grouping_paths() which doesn't look > appropriate. I guess, you need relids in grouped_rel->relids for FDW. > There are > two ways to do this: 1. set grouped_rel->relids for parent upper rel as > well, > but then we should pass relids to fetch_upper_rel() instead of setting > those > later. 2. For a parent upper rel, in create_foreignscan_plan(), set relids > to > all_baserels, if upper_rel->relids is NULL and don't set relids for a > parent > upper rel. I am fine with either of those. > Done. Opted second option. > > + /* partial phase */ > + get_agg_clause_costs(root, (Node *) partial_target->exprs, > + AGGSPLIT_INITIAL_SERIAL, > + &agg_partial_costs); > > IIUC, the costs for evaluating aggregates would not change per child. They > won't be different for parent and any of the children. So, we should be > able to > calculate those once, save in "extra" and use repeatedly. > Yep. Done. > > + if (can_sort) > + { > + Path *path = cheapest_path; > + > + if (!(pathkeys_contained_in(root->group_pathkeys, > + path->pathkeys))) > [ .. clipped patch .. ] > + NIL, > + dNumGroups)); > + } > > We create two kinds of paths partial paths for parallel query and partial > aggregation paths when group keys do not have partition keys. The comments > and > code uses partial to mean both the things, which is rather confusing. May > be we > should use term "partial aggregation" explicitly wherever it means that in > comments and in variable names. > Agree. Used "partial aggregation" wherever applicable. Let me know if you see any other place need this adjustments. > I still feel that create_grouping_paths() and create_child_grouping_paths() > have a lot of common code. While I can see that there are some pockets in > create_grouping_paths() which are not required in > create_child_grouping_paths() > and vice-versa, may be we should create only one function and move those > pockets under "if (child)" or "if (parent)" kind of conditions. It will be > a > maintenance burden to keep these two functions in sync in future. If we do > not > keep them in sync, that will introduce bugs. > Agree that keeping these two functions in sync in future will be a maintenance burden, but I am not yet sure how to refactor them cleanly. Will give one more try and update those changes in the next patchset. > + > +/* > + * create_partition_agg_paths > + * > + * Creates append path for all the partitions and adds into the grouped > rel. > > I think you want to write "Creates an append path containing paths from > all the > child grouped rels and adds into the given parent grouped rel". > Reworded as you said. > > + * For partial mode we need to add a finalize agg node over append path > before > + * adding a path to grouped rel. > + */ > +void > +create_partition_agg_paths(PlannerInfo *root, > + RelOptInfo *grouped_rel, > + List *subpaths, > + List *partial_subpaths, > > Why do we have these two as separate arguments? I don't see any call to > create_partition_agg_paths() through add_append_path() passing both of them > non-NULL simultaneously. May be you want use a single list subpaths and > another > boolean indicating whether it's list of partial paths or regular paths. > > After redesigning in the area of putting gather over append, I don't need to pass all Append subpaths to this function at-all. Append is done by add_paths_to_append_rel() itself. This function now just adds fanalization steps as needed. So, we don't have two lists now. And to know about partial paths, passed a boolean instead. Please have a look and let me know if I missed any. > + > + /* For non-partial path, just create a append path and we are done. */ > This is the kind of confusion, I am talking about above. Here you have > mentioned "non-partial path" which may mean a regular path but what you > actually mean by that term is a path representing partial aggregates. > > + /* > + * Partial partition-wise grouping paths. Need to create final > + * aggregation path over append relation. > + * > + * If there are partial subpaths, then we need to add gather path > before we > + * append these subpaths. > > More confusion here. > Hopefully no more confusion in this new version. > + */ > + if (partial_subpaths) > + { > + ListCell *lc; > + > + Assert(subpaths == NIL); > + > + foreach(lc, partial_subpaths) > + { > + Path *path = lfirst(lc); > + double total_groups = path->rows * > path->parallel_workers; > + > + /* Create gather paths for partial subpaths */ > + Path *gpath = (Path *) create_gather_path(root, grouped_rel, > path, > + path->pathtarget, > NULL, > + &total_groups); > + > + subpaths = lappend(subpaths, gpath); > > Using the argument variable is confusing and that's why you need two > different > List variables. Instead probably you could have another variable local to > this > function to hold the gather subpaths. > Done. > > AFAIU, the Gather paths that this code creates has its parent set to > parent grouped > rel. That's not correct. These partial paths come from children of grouped > rel > and each gather is producing rows corresponding to one children of grouped > rel. > So gather path's parent should be set to corresponding child and not parent > grouped rel. > Yep. > > This code creates plans where there are multiple Gather nodes under an > Append > node. AFAIU, the workers assigned to one gather node can be reused until > that > Gather node finishes. Having multiple Gather nodes under an Append mean > that > every worker will be idle from the time that worker finishes the work till > the > last worker finishes the work. That doesn't seem to be optimal use of > workers. > The plan that we create with Gather on top of Append seems to be better. > So, we > should avoid creating one Gather node per child plans. Have we tried to > compare > performance of these two plans? > Agree. Having Gather on top of the Append is better. Done that way. It resolves your previous comment too. > + if (!IsA(apath, MergeAppendPath) && root->group_pathkeys) > + { > + spath = (Path *) create_sort_path(root, > + grouped_rel, > + apath, > + root->group_pathkeys, > + -1.0); > + } > > The code here assumes that a MergeAppend path will always have pathkeys > matching group_pathkeys. I believe that's true but probably we should have > an > Assert to make it clear and add comments. If that's not true, we will need > to > sort the output of MergeAppend OR discard MergeAppend paths which do not > have > pathkeys matching group_pathkeys. > Oops. Thanks for pointing out that. You are correct. Added relevant check which checks for required pathkeys present or not. > > > diff --git a/src/tools/pgindent/typedefs.list > b/src/tools/pgindent/typedefs.list > index b422050..1941468 100644 > --- a/src/tools/pgindent/typedefs.list > +++ b/src/tools/pgindent/typedefs.list > @@ -2345,6 +2345,7 @@ UnlistenStmt > UnresolvedTup > UnresolvedTupData > UpdateStmt > +UpperPathExtraData > UpperRelationKind > UpperUniquePath > UserAuth > > Do we commit this file as part of the feature? > -- > Best Wishes, > Ashutosh Bapat > EnterpriseDB Corporation > The Postgres Database Company > This patchset contains fixes for other review comments too. Thanks -- Jeevan Chalke Technical Architect, Product Development EnterpriseDB Corporation The Enterprise PostgreSQL Company
partition-wise-agg-v9.tar.gz
Description: GNU Zip compressed data