[Impala-ASF-CR] IMPALA-4794: Partition distinct expr for skew data

Alex Behm (Code Review) Fri, 11 Aug 2017 17:49:59 -0700

Alex Behm has posted comments on this change.

Change subject: IMPALA-4794: Partition distinct expr for skew data
......................................................................



Patch Set 1:

(19 comments)

http://gerrit.cloudera.org:8080/#/c/7643/1//COMMIT_MSG
Commit Message:

Line 7: IMPALA-4794: Partition distinct expr for skew data
Summary msg is not very clear to me, e.g. "for data skew" should be "handling 
data skew". I suggest something like:

IMPALA-4794: Grouping distinct agg plan robust to data skew


Line 9: Currently in an aggregation with grouping and distinct expr, there
Needs cleanup for grammar and wording. Writing crisp prose is very important 
for readers.

Here are some pointers. The first sentence should describe the problem and 
high-level solution. For example:

This patch changes the query plan for grouping distinct aggregations to be more 
robust to data skew in the grouping expressions.

The existing plan ... etc.

The new plan ... etc.


Line 19: Testing: In planner test, some plans with distinct aggregation change.
Testing: Modified existing planner tests which already provide sufficient 
coverage.


http://gerrit.cloudera.org:8080/#/c/7643/1/fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java
File fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java:

Line 861:   private PlanFragment 
createPhase2DistinctAggregationFragment(AggregationNode node,
please rename 'node' to 'phase2AggNode' for clarity


Line 864:     ArrayList<Expr> groupingExprs = 
node.getAggInfo().getGroupingExprs();
Clean up: these groupingExprs are not needed anymore, and neither is the 
hasGrouping variable. Remove these and directly check 
node.getAggInfo().getGroupingExprs().isEmpty() in L924


Line 867:     // The first-phase aggregation node is already in the child 
fragment.
first-phase -> phase-1


Line 871:     List<Expr> partitionExprs = null;
combine with L883


Line 875:     // - merge fragment, hash-partitioned on grouping and distinct 
exprs:
first merge fragment


Line 876:     //   * merge agg of phase 1
let's use "phase-N" consistently (with hyphen)


Line 878:     // - merge fragment 2, partitioned on grouping exprs or 
unpartitioned with no grouping
second merge fragment


Line 887:     PlanFragment mergeFragment = null;
use firstMergeFragment and secondMergeFragment variables to make the 
distinction clearer


Line 917:     if (mergeFragment != childFragment) fragments.add(mergeFragment);
move at the end of the the else block in L895


Line 920:     // Any limit should be placed in the final merge aggregation node
I think it's clearer to cluster the code for limit setting/unsetting together, 
e.g. in L939 you can do something like:

// Limit should be applied at the final merge aggregation node
phase2MergeAggNode.setLimit(node.getLimit());
node.unsetLimit();


Line 922:     node.unsetLimit();
If we have grouping, we should make the phase2AggNode streaming like in L899


Line 925:       // place the merge aggregation of the 2nd phase in an 
unpartitioned fragment;
remove comment, doesn't add any value


Line 929:         node.getAggInfo().getMergeAggInfo().getGroupingExprs());
indent 4


Line 931:     // add preceding merge fragment at end
not sure what this comment means, I don't think it makes sense after your 
changes


Line 934:       node.getAggInfo().getMergeAggInfo());
indent 4


Line 936:     // Transfer having predicates. If hasGrouping == true, the 
predicates should
adjust comment to reflect the new code


-- 
To view, visit http://gerrit.cloudera.org:8080/7643
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I7bdada0e328b555900c7b7ff8aabc8eb15ae8fa9
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tianyi Wang <[email protected]>
Gerrit-Reviewer: Alex Behm <[email protected]>
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-4794: Partition distinct expr for skew data

Reply via email to