Csaba Ringhofer has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/19030 )

Change subject: IMPALA-10891: Avoid hash exchanges in more situations
......................................................................


Patch Set 3:

(8 comments)

http://gerrit.cloudera.org:8080/#/c/19030/3//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/19030/3//COMMIT_MSG@8
PS3, Line 8:
High level design comments:

I think that the changes should be mainly in the planner, not in the analyzer.

Deciding which expressions should be used in an exchange node's hash key before 
join nodes could be done when these exchange nodes are created: 
https://github.com/apache/impala/blob/296e94411d3344e2969d4b083036ff238e80ad19/fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java#L426

Deciding when to create an extra partitioned fragment for group by is done 
here: 
https://github.com/apache/impala/blob/296e94411d3344e2969d4b083036ff238e80ad19/fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java#L910
We could accept it as "good enough" partitioning if the child fragment's 
partitioning is a subset of the group by and has a high enough NDV


http://gerrit.cloudera.org:8080/#/c/19030/3//COMMIT_MSG@16
PS3, Line 16: Testing
Did you run the full test suite? You can run the tests on a review with 
https://jenkins.impala.io/job/gerrit-verify-dryrun-external/


http://gerrit.cloudera.org:8080/#/c/19030/3/fe/src/main/java/org/apache/impala/analysis/AggregateInfo.java
File fe/src/main/java/org/apache/impala/analysis/AggregateInfo.java:

http://gerrit.cloudera.org:8080/#/c/19030/3/fe/src/main/java/org/apache/impala/analysis/AggregateInfo.java@127
PS3, Line 127: partitionExprs_
It seems very counter intuitive to me to change a member in a "get" function.


http://gerrit.cloudera.org:8080/#/c/19030/3/fe/src/main/java/org/apache/impala/analysis/SlotRef.java
File fe/src/main/java/org/apache/impala/analysis/SlotRef.java:

http://gerrit.cloudera.org:8080/#/c/19030/3/fe/src/main/java/org/apache/impala/analysis/SlotRef.java@197
PS3, Line 197:  != -1
0 could be also checked to avoid division by 0


http://gerrit.cloudera.org:8080/#/c/19030/3/fe/src/main/java/org/apache/impala/analysis/SlotRef.java@198
PS3, Line 198: numDistinctValues_ / rootTable.getNumRows()
Aren't we too strict here? My understanding is that the NDV should be high 
enough to spread the rows ~evenly among fragment instances. For example an ndv 
of 100 should be enough to spread to rows evenly among 2 instances even if the 
number of rows is 1000


http://gerrit.cloudera.org:8080/#/c/19030/3/testdata/workloads/functional-planner/queries/PlannerTest/optimize_shuffle.test
File 
testdata/workloads/functional-planner/queries/PlannerTest/optimize_shuffle.test:

http://gerrit.cloudera.org:8080/#/c/19030/3/testdata/workloads/functional-planner/queries/PlannerTest/optimize_shuffle.test@1
PS3, Line 1: optimize
Can you mention what did we optimize in this query exactly? My understanding is 
that we have a cheaper hash function in node 02 (hash id vs hash id, bool_col).


http://gerrit.cloudera.org:8080/#/c/19030/3/testdata/workloads/functional-planner/queries/PlannerTest/optimize_shuffle.test@24
PS3, Line 24: optimize
Can you mention what did we optimize in this query exactly? My understanding is 
that we have a cheaper hash function in node 04/05 (hash id vs hash id, 
bigint_col) and skipping a preagregate + shuffle nodes between 02 and 03


http://gerrit.cloudera.org:8080/#/c/19030/3/testdata/workloads/functional-planner/queries/PlannerTest/optimize_shuffle.test@56
PS3, Line 56: ====
Are there other nodes modified by this change than join and grouping aggregate?



--
To view, visit http://gerrit.cloudera.org:8080/19030
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ie627d07b729f1f584f589cd0251dfe2bb64a417f
Gerrit-Change-Number: 19030
Gerrit-PatchSet: 3
Gerrit-Owner: Xiaoqing Gao <[email protected]>
Gerrit-Reviewer: Csaba Ringhofer <[email protected]>
Gerrit-Reviewer: Impala Public Jenkins <[email protected]>
Gerrit-Comment-Date: Fri, 23 Sep 2022 18:58:29 +0000
Gerrit-HasComments: Yes

Reply via email to