Alex Behm has posted comments on this change. Change subject: IMPALA-4794: Grouping distinct agg plan robust to data skew ......................................................................
Patch Set 3: (6 comments) Looks good, just minor polish of the commit msg. http://gerrit.cloudera.org:8080/#/c/7643/3//COMMIT_MSG Commit Message: Line 11: plan partitions data between phase-1 and phase-2 by grouping expr and the grouping exprs Line 12: the data skewness on grouping expr directly impacts performance. The make this statement about skew a separate sentence Line 13: new plan partitions data by both grouping expr and distinct aggregation by both the grouping and distinct agg exprs Line 14: expr, then adds one more aggregation and exchange node. It is supposed Try to avoid descriptions like "supposed to be". We should test and understand the impact of changes clearly and state it with confidence. I suggest rephrasing to something like: The new plan is more robust to data skew but does more work than the old plan. Line 19: exchange node, followed by an additional merge aggregation and exchange the first exchange node http://gerrit.cloudera.org:8080/#/c/7643/2/fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java File fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java: Line 917: > I think that only works with grouping expr and is at line 922. Not sure how I missed that line. You are right, of course. -- To view, visit http://gerrit.cloudera.org:8080/7643 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I7bdada0e328b555900c7b7ff8aabc8eb15ae8fa9 Gerrit-PatchSet: 3 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tianyi Wang <[email protected]> Gerrit-Reviewer: Alex Behm <[email protected]> Gerrit-Reviewer: Tianyi Wang <[email protected]> Gerrit-HasComments: Yes
