Csaba Ringhofer has posted comments on this change. ( http://gerrit.cloudera.org:8080/19030 )
Change subject: IMPALA-10891: Avoid hash exchanges in more situations ...................................................................... Patch Set 3: (8 comments) http://gerrit.cloudera.org:8080/#/c/19030/3//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/19030/3//COMMIT_MSG@8 PS3, Line 8: High level design comments: I think that the changes should be mainly in the planner, not in the analyzer. Deciding which expressions should be used in an exchange node's hash key before join nodes could be done when these exchange nodes are created: https://github.com/apache/impala/blob/296e94411d3344e2969d4b083036ff238e80ad19/fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java#L426 Deciding when to create an extra partitioned fragment for group by is done here: https://github.com/apache/impala/blob/296e94411d3344e2969d4b083036ff238e80ad19/fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java#L910 We could accept it as "good enough" partitioning if the child fragment's partitioning is a subset of the group by and has a high enough NDV http://gerrit.cloudera.org:8080/#/c/19030/3//COMMIT_MSG@16 PS3, Line 16: Testing Did you run the full test suite? You can run the tests on a review with https://jenkins.impala.io/job/gerrit-verify-dryrun-external/ http://gerrit.cloudera.org:8080/#/c/19030/3/fe/src/main/java/org/apache/impala/analysis/AggregateInfo.java File fe/src/main/java/org/apache/impala/analysis/AggregateInfo.java: http://gerrit.cloudera.org:8080/#/c/19030/3/fe/src/main/java/org/apache/impala/analysis/AggregateInfo.java@127 PS3, Line 127: partitionExprs_ It seems very counter intuitive to me to change a member in a "get" function. http://gerrit.cloudera.org:8080/#/c/19030/3/fe/src/main/java/org/apache/impala/analysis/SlotRef.java File fe/src/main/java/org/apache/impala/analysis/SlotRef.java: http://gerrit.cloudera.org:8080/#/c/19030/3/fe/src/main/java/org/apache/impala/analysis/SlotRef.java@197 PS3, Line 197: != -1 0 could be also checked to avoid division by 0 http://gerrit.cloudera.org:8080/#/c/19030/3/fe/src/main/java/org/apache/impala/analysis/SlotRef.java@198 PS3, Line 198: numDistinctValues_ / rootTable.getNumRows() Aren't we too strict here? My understanding is that the NDV should be high enough to spread the rows ~evenly among fragment instances. For example an ndv of 100 should be enough to spread to rows evenly among 2 instances even if the number of rows is 1000 http://gerrit.cloudera.org:8080/#/c/19030/3/testdata/workloads/functional-planner/queries/PlannerTest/optimize_shuffle.test File testdata/workloads/functional-planner/queries/PlannerTest/optimize_shuffle.test: http://gerrit.cloudera.org:8080/#/c/19030/3/testdata/workloads/functional-planner/queries/PlannerTest/optimize_shuffle.test@1 PS3, Line 1: optimize Can you mention what did we optimize in this query exactly? My understanding is that we have a cheaper hash function in node 02 (hash id vs hash id, bool_col). http://gerrit.cloudera.org:8080/#/c/19030/3/testdata/workloads/functional-planner/queries/PlannerTest/optimize_shuffle.test@24 PS3, Line 24: optimize Can you mention what did we optimize in this query exactly? My understanding is that we have a cheaper hash function in node 04/05 (hash id vs hash id, bigint_col) and skipping a preagregate + shuffle nodes between 02 and 03 http://gerrit.cloudera.org:8080/#/c/19030/3/testdata/workloads/functional-planner/queries/PlannerTest/optimize_shuffle.test@56 PS3, Line 56: ==== Are there other nodes modified by this change than join and grouping aggregate? -- To view, visit http://gerrit.cloudera.org:8080/19030 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ie627d07b729f1f584f589cd0251dfe2bb64a417f Gerrit-Change-Number: 19030 Gerrit-PatchSet: 3 Gerrit-Owner: Xiaoqing Gao <[email protected]> Gerrit-Reviewer: Csaba Ringhofer <[email protected]> Gerrit-Reviewer: Impala Public Jenkins <[email protected]> Gerrit-Comment-Date: Fri, 23 Sep 2022 18:58:29 +0000 Gerrit-HasComments: Yes
