Re: [HACKERS] Partition-wise aggregation/grouping

Ashutosh Bapat Thu, 28 Sep 2017 02:43:23 -0700

On Wed, Sep 27, 2017 at 3:42 PM, Jeevan Chalke
<jeevan.cha...@enterprisedb.com> wrote:
> Thanks Ashutosh for reviewing.
>
> Attached new patch-set with following changes:
>
> 1. Removed earlier 0007 and 0008 patches which were PoC for supporting
> partial aggregation over fdw. I removed them as it will be a different
> issue altogether and hence I will tackle them separately once this is
> done.
>
> This patch-set now includes support for parallel plans within partitions.
>
> Notes:
> HEAD: 59597e6
> Partition-wise Join Version: 34
>
> (First six patches 0001 - 0006, remains the same functionality-wise)
> 0007 - Refactors partial grouping paths creation into the separate function.
> 0008 - Enables parallelism within the partition-wise aggregation.
>
> This patch also includes a fix for the crash reported by Rajkumar.
> While forcibly applying scan/join target to all the Paths for the scan/join
> rel, earlier I was using apply_projection_to_path() which modifies the path
> in-place which causing this crash as the path finally chosen has been
> updated by partition-wise agg path creation. Now I have used
> create_projection_path() like we do in partial aggregation paths.
>
> Also, fixed issues reported by Ashutosh.


Thanks.

Here are comments on 0004 from last patch set. But most the comments
still apply.

Patch 0001 adds functions create_hash_agg_path() and create_sort_agg_path().
Patch 0004 adds a new argument to those functions for conditions in HAVING
clause. We should move those changes to 0001 and pass parse->havingQual to
these functions in 0001 itself. That will keep all changes to those functions
together and also make 0003 small.

The prologue of try_partition_wise_grouping() mentions a restriction of
partition keys being leading group by clauses. This restriction is not
specified in the prologue of have_grouping_by_partkey(), which actually checks
for this restriction. The requirement per prologue of that function is just to
have partition keys in group clause. I think have_grouping_by_partkey() is
correct, and we should change prologue of try_partition_wise_grouping() to be
in sync with have_grouping_by_partkey(). The prologue explains why
partition-wise aggregation/grouping would be efficient with this restriction,
but it doesn't explain why partial aggregation/grouping per partition would be
efficient. May be it should do that as well. Similar is the case with partial
aggregation/grouping discussion in README.

+    /* Do not handle grouping sets for now. */
Is this a real restriction or just restriction for first cut of this feature?
Can you please add more explanation? May be update README as well?

+    grouped_rel->part_scheme = input_rel->part_scheme;
Is this true even for partial aggregates? I think not. Since group by clause
does not contain partition keys, the rows from multiple partitions participate
in one group and thus the partition keys of input relation do not apply to the
grouped relation. In this case, it seems that the grouped rel will have
part_rels but will not be partitioned.

+        /*
+         * If there is no path for the child relation, then we cannot add
+         * aggregate path too.
+         */
+        if (input_child_rel->pathlist == NIL)
+            return;
When can this happen? I think, similar to partition-wise join it happens when
there is a dummy parent relation. See [1]. If that's the case, you may want to
do things similar to what partition-wise join is doing. If there's some other
reason for doing this, returing from here half-way is actually waste of memory
and planning time. Instead, we may want to loop over the part_rels to find if
any of those have empty pathlist and return from there before doing any actual
work.

+        extra.pathTarget = child_target;
+        extra.inputRows = input_child_rel->cheapest_startup_path->rows;
+        extra.havingQual = (Node *) adjust_appendrel_attrs(root,
+                                                           (Node *)
query->havingQual,
+                                                           nappinfos,
+                                                           appinfos);
These lines are updating some fields of "extra" structure in every loop. The
structure is passed to create_child_grouping_paths() in the loop and to
add_paths_to_append_rel() outside the loop. Thus add_paths_to_append_rel() only
gets some member values for the last child. Is that right? Should we split
extra into two structures one to be used within the loop and one outside? Or
may be send the members being updated within the loop separately?

+        /*
+         * Forcibly apply scan/join target to all the Paths for the scan/join
+         * rel.
+         *
[ lines clipped ]
+                if (subpath == input_child_rel->cheapest_total_path)
+                    input_child_rel->cheapest_total_path = path;
+            }
+        }
This code seems to be copied from grouping_planner() almost verbatim. Is there
a way we can refactor it into a function and use it in both the places.

have_grouping_by_partkey() may use match_expr_to_partition_keys() to find
whether a given clause expression matches any of the partition keys. Or you
could use list_intersection() instead of following loop
+        foreach(lc, partexprs)
+        {
+            Expr       *partexpr = lfirst(lc);
+
+            if (list_member(groupexprs, partexpr))
+            {
+                found = true;
+                break;
+            }
+        }
+        /*
+         * If none of the partition key matches with any of the GROUP BY
+         * expression, return false.
+         */
+        if (!found)
+            return false;

create_child_grouping_paths() and create_grouping_paths() has almost similar
code. Is there a way we could refactor the code to extract common code into a
function called by these two functions or reuse create_grouping_paths() for
children as well? I don't think we will be able to do the later.

+    /* Nothing to do if there is an empty pathlist */
+    if (grouped_rel->pathlist == NIL)
+        return false;
When would that happen? Similar condition in case of parent grouped rel throws
an error, so when this code is called, we know for sure that parent had
non-empty pathlist. So, we would expect child to have non-empty pathlist as
well.

+        grouped_rel = fetch_upper_rel(root, UPPERREL_GROUP_AGG,
+                                      input_rel->relids);
+
+        /* Mark this rel as "other upper rel" */
+        grouped_rel->reloptkind = RELOPT_OTHER_UPPER_REL;
I think we need to pass relopkind as an argument to fetch_upper_rel(), now that
we have "upper" relations and "other upper" relations. relids will still be a
"key" to find an upper relation but its reloptkind should match the given
reloptkind.  fetch_upper_rel() is used to create the upper relation if it
doesn't exist. So, with the above code, if some other function calls
fetch_upper_rel() with given relids, it would get an upper rel with
RELOPT_UPPER_REL and then this code would change it to RELOPT_OTHER_UPPER_REL.
That looks odd. May be we should have written build_upper_rel() and
find_upper_rel() similar to build_join_rel() and find_join_rel() instead of
combining both the functionalities in one function.

+        /*
+         * create_append_path() sets the path target from the given relation.
+         * However, in this case grouped_rel doesn't have a target set.  So we
+         * must set the pathtarget to the passed in target.
+         */
+        apath->pathtarget = target;
I think, we should change create_append_path() functions to accept target as an
additional argument. For append rels other than aggregate and grouping, target
will be same as relation's target. For agg/group append rels, we will pass
different targets for partial and non-partial grouping paths.

+        /*
+         * Since Append's output is always unsorted, we'll need to sort,
+         * unless there's no GROUP BY clause or a degenerate (constant) one,
+         * in which case there will only be a single group.
+         */
append path here can be output of either merge append or append. If it's output
of merge append, we don't need to sort it again, do we?

create_partition_agg_paths() creates append paths and then adds finalization
path if necessary. The code to add finalization path seems to be similar to the
code that adds finalization path for parallel query. May be we could take out
common code into a function and call that function in two places. I see this
function as accepting a partial aggregation/grouping path and returning a path
that finalizes partial aggregates/groups.

[1] 
https://www.postgresql.org/message-id/CAFjFpRd5+zroxY7UMGTR2M=rjbv4abocxqg3+1rbmtplk5m...@mail.gmail.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Partition-wise aggregation/grouping

Reply via email to