Re: [HACKERS] Partition-wise aggregation/grouping

Jeevan Chalke Mon, 01 Jan 2018 23:07:38 -0800

Attached patchset with all the review comments reported so far.

On Fri, Dec 1, 2017 at 6:11 PM, Ashutosh Bapat <
ashutosh.ba...@enterprisedb.com> wrote:


> Continuing with review of 0007.
>
> +
> +    /* Copy input rels's relids to grouped rel */
> +    grouped_rel->relids = input_rel->relids;
>
> Isn't this done in fetch_upper_rel()? Why do we need it here?
> There's also a similar hunk in create_grouping_paths() which doesn't look
> appropriate. I guess, you need relids in grouped_rel->relids for FDW.
> There are
> two ways to do this: 1. set grouped_rel->relids for parent upper rel as
> well,
> but then we should pass relids to fetch_upper_rel() instead of setting
> those
> later. 2. For a parent upper rel, in create_foreignscan_plan(), set relids
> to
> all_baserels, if upper_rel->relids is NULL and don't set relids for a
> parent
> upper rel. I am fine with either of those.
>

Done. Opted second option.


>
> +            /* partial phase */
> +            get_agg_clause_costs(root, (Node *) partial_target->exprs,
> +                                 AGGSPLIT_INITIAL_SERIAL,
> +                                 &agg_partial_costs);
>
> IIUC, the costs for evaluating aggregates would not change per child. They
> won't be different for parent and any of the children. So, we should be
> able to
> calculate those once, save in "extra" and use repeatedly.
>

Yep. Done.


>
> +        if (can_sort)
> +        {
> +            Path       *path = cheapest_path;
> +
> +            if (!(pathkeys_contained_in(root->group_pathkeys,
> +                                        path->pathkeys)))
> [ .. clipped patch .. ]
> +                                           NIL,
> +                                           dNumGroups));
> +        }
>
> We create two kinds of paths partial paths for parallel query and partial
> aggregation paths when group keys do not have partition keys. The comments
> and
> code uses partial to mean both the things, which is rather confusing. May
> be we
> should use term "partial aggregation" explicitly wherever it means that in
> comments and in variable names.
>

Agree. Used "partial aggregation" wherever applicable. Let me know if you
see any other place need this adjustments.


> I still feel that create_grouping_paths() and create_child_grouping_paths()
> have a lot of common code. While I can see that there are some pockets in
> create_grouping_paths() which are not required in
> create_child_grouping_paths()
> and vice-versa, may be we should create only one function and move those
> pockets under "if (child)" or "if (parent)" kind of conditions. It will be
> a
> maintenance burden to keep these two functions in sync in future. If we do
> not
> keep them in sync, that will introduce bugs.
>

Agree that keeping these two functions in sync in future will be a
maintenance burden, but I am not yet sure how to refactor them cleanly.
Will give one more try and update those changes in the next patchset.


> +
> +/*
> + * create_partition_agg_paths
> + *
> + * Creates append path for all the partitions and adds into the grouped
> rel.
>
> I think you want to write "Creates an append path containing paths from
> all the
> child grouped rels and adds into the given parent grouped rel".
>

Reworded as you said.


>
> + * For partial mode we need to add a finalize agg node over append path
> before
> + * adding a path to grouped rel.
> + */
> +void
> +create_partition_agg_paths(PlannerInfo *root,
> +                           RelOptInfo *grouped_rel,
> +                           List *subpaths,
> +                           List *partial_subpaths,
>
> Why do we have these two as separate arguments? I don't see any call to
> create_partition_agg_paths() through add_append_path() passing both of them
> non-NULL simultaneously. May be you want use a single list subpaths and
> another
> boolean indicating whether it's list of partial paths or regular paths.
>
>
After redesigning in the area of putting gather over append, I don't need
to pass all Append subpaths to this function at-all. Append is done by
add_paths_to_append_rel() itself. This function now just adds fanalization
steps as needed.
So, we don't have two lists now. And to know about partial paths, passed a
boolean instead. Please have a look and let me know if I missed any.


> +
> +    /* For non-partial path, just create a append path and we are done. */
> This is the kind of confusion, I am talking about above. Here you have
> mentioned "non-partial path" which may mean a regular path but what you
> actually mean by that term is a path representing partial aggregates.
>

> +    /*
> +     * Partial partition-wise grouping paths.  Need to create final
> +     * aggregation path over append relation.
> +     *
> +     * If there are partial subpaths, then we need to add gather path
> before we
> +     * append these subpaths.
>
> More confusion here.
>

Hopefully no more confusion in this new version.


> +     */
> +    if (partial_subpaths)
> +    {
> +        ListCell   *lc;
> +
> +        Assert(subpaths == NIL);
> +
> +        foreach(lc, partial_subpaths)
> +        {
> +            Path       *path = lfirst(lc);
> +            double        total_groups = path->rows *
> path->parallel_workers;
> +
> +            /* Create gather paths for partial subpaths */
> +            Path *gpath = (Path *) create_gather_path(root, grouped_rel,
> path,
> +                                                      path->pathtarget,
> NULL,
> +                                                      &total_groups);
> +
> +            subpaths = lappend(subpaths, gpath);
>
> Using the argument variable is confusing and that's why you need two
> different
> List variables. Instead probably you could have another variable local to
> this
> function to hold the gather subpaths.
>

Done.


>
> AFAIU, the Gather paths that this code creates has its parent set to
> parent grouped
> rel. That's not correct. These partial paths come from children of grouped
> rel
> and each gather is producing rows corresponding to one children of grouped
> rel.
> So gather path's parent should be set to corresponding child and not parent
> grouped rel.
>

Yep.


>
> This code creates plans where there are multiple Gather nodes under an
> Append
> node. AFAIU, the workers assigned to one gather node can be reused until
> that
> Gather node finishes. Having multiple Gather nodes under an Append mean
> that
> every worker will be idle from the time that worker finishes the work till
> the
> last worker finishes the work. That doesn't seem to be optimal use of
> workers.
> The plan that we create with Gather on top of Append seems to be better.
> So, we
> should avoid creating one Gather node per child plans. Have we tried to
> compare
> performance of these two plans?
>

Agree. Having Gather on top of the Append is better. Done that way. It
resolves your previous comment too.


> +        if (!IsA(apath, MergeAppendPath) && root->group_pathkeys)
> +        {
> +            spath = (Path *) create_sort_path(root,
> +                                              grouped_rel,
> +                                              apath,
> +                                              root->group_pathkeys,
> +                                              -1.0);
> +        }
>
> The code here assumes that a MergeAppend path will always have pathkeys
> matching group_pathkeys. I believe that's true but probably we should have
> an
> Assert to make it clear and add comments. If that's not true, we will need
> to
> sort the output of MergeAppend OR discard MergeAppend paths which do not
> have
> pathkeys matching group_pathkeys.
>

Oops. Thanks for pointing out that. You are correct.
Added relevant check which checks for required pathkeys present or not.


>
>
> diff --git a/src/tools/pgindent/typedefs.list
> b/src/tools/pgindent/typedefs.list
> index b422050..1941468 100644
> --- a/src/tools/pgindent/typedefs.list
> +++ b/src/tools/pgindent/typedefs.list
> @@ -2345,6 +2345,7 @@ UnlistenStmt
>  UnresolvedTup
>  UnresolvedTupData
>  UpdateStmt
> +UpperPathExtraData
>  UpperRelationKind
>  UpperUniquePath
>  UserAuth
>
> Do we commit this file as part of the feature?
> --
> Best Wishes,
> Ashutosh Bapat
> EnterpriseDB Corporation
> The Postgres Database Company
>

This patchset contains fixes for other review comments too.

Thanks
-- 
Jeevan Chalke
Technical Architect, Product Development
EnterpriseDB Corporation
The Enterprise PostgreSQL Company

partition-wise-agg-v9.tar.gz
Description: GNU Zip compressed data

Re: [HACKERS] Partition-wise aggregation/grouping

Reply via email to