On Thu, Mar 8, 2018 at 1:15 PM, Ashutosh Bapat < ashutosh.ba...@enterprisedb.com> wrote:
> On Wed, Mar 7, 2018 at 8:07 PM, Jeevan Chalke > <jeevan.cha...@enterprisedb.com> wrote: > Here are some more review comments esp. on > try_partitionwise_grouping() function. BTW name of that function > doesn't go in sync with enable_partitionwise_aggregation (which btw is > in sync with enable_fooagg GUCs). But it goes in sync with > create_grouping_paths(). It looks like we have already confused > aggregation and grouping e.g. enable_hashagg may affect path creation > when there is no aggregation involved i.e. only grouping is involved > but create_grouping_paths will create paths when there is no grouping > involved. Generally it looks like the function names use grouping to > mean both aggregation and grouping but GUCs use aggregation to mean > both of those. So, the naming in this patch looks consistent with the > current naming conventions. > > + grouped_rel->part_scheme = input_rel->part_scheme; > + grouped_rel->nparts = nparts; > + grouped_rel->boundinfo = input_rel->boundinfo; > + grouped_rel->part_rels = part_rels; > > You need to set the part_exprs which will provide partition keys for this > partitioned relation. I think, we should include all the part_exprs of > input_rel which are part of GROUP BY clause. Since any other expressions in > part_exprs are not part of GROUP BY clause, they can not appear in the > targetlist without an aggregate on top. So they can't be part of the > partition > keys of the grouped relation. > > In create_grouping_paths() we fetch both partial as well as fully grouped > rel > for given input relation. But in case of partial aggregation, we don't need > fully grouped rel since we are not computing full aggregates for the > children. > Since fetch_upper_rel() creates a relation when one doesn't exist, we are > unnecessarily creating fully grouped rels in this case. For thousands of > partitions that's a lot of memory wasted. > > I see a similar issue with create_grouping_paths() when we are computing > only > full aggregates (either because partial aggregation is not possible or > because > parallelism is not possible). In that case, we unconditionally create > partially > grouped rels. That too would waste a lot of memory. > > I think that partially_grouped_rel, when required, is partitioned > irrespective > of whether we do full aggregation per partition or not. So, if we have its > part_rels and other partitioning properties set. I agree that right now we > won't use this information anywhere. It may be useful, in case we device a > way > to use partially_grouped_rel directly without using grouped_rel for > planning > beyond grouping, which seems unlikely. > > + > + /* > + * Parallel aggregation requires partial target, so compute it > here > + * and translate all vars. For partial aggregation, we need it > + * anyways. > + */ > + partial_target = make_partial_grouping_target(root, target, > + havingQual); > > Don't we have this available in partially_grouped_rel? > > That shows one asymmetry that Robert's refactoring has introduced. We > don't set > reltarget of grouped_rel but set reltarget of partially_grouped_rel. If > reltarget of grouped_rel changes across paths (the reason why we have not > set > it in grouped_rel), shouldn't reltarget of partially grouped paths change > accordingly? > I am not sure why we don't set reltarget into the grouped_rel too. But if we do so like we did in partially_grouped_rel, then it will be lot easier for partitionwise aggregate as then we don't have to pass target to functions creating paths like create_append_path. We now need to update generate_gather_paths() to take target too as it is now being called on grouped_rel in which reltarget is not set. But yes, if there is any specific reason we can't do so, then I think the same like Ashutosh Said. I didn't aware of such reason though. > + > +/* > + * group_by_has_partkey > + * > + * Returns true, if all the partition keys of the given relation are part > of > + * the GROUP BY clauses, false otherwise. > + */ > +static bool > +group_by_has_partkey(RelOptInfo *input_rel, PathTarget *target, > + List *groupClause) > > We could modify this function to return the list of part_exprs which are > part > of group clause and use that as the partition keys of the grouped_rel if > required. If group by doesn't have all the partition keys, the function > would > return a NULL list. > > Right now, in case of full aggregation, partially_grouped_rel is populated > with > the partial paths created by adding partial aggregation to the partial > paths of > input relation. But we are not trying to create partial paths by (parallel) > appending the (non)partial paths from the child partially_grouped_rel. > Have we > thought about that? Would such paths have different shapes from the ones > that > we create now and will they be better? > > -- > Best Wishes, > Ashutosh Bapat > EnterpriseDB Corporation > The Postgres Database Company > -- Jeevan Chalke Technical Architect, Product Development EnterpriseDB Corporation The Enterprise PostgreSQL Company