Yida Wu has posted comments on this change. ( http://gerrit.cloudera.org:8080/19511 )
Change subject: IMPALA-10861: Optimize the plan for identical predicates ...................................................................... Patch Set 3: (2 comments) Sorry for late feedback. Looks good to me, just one or two questions. http://gerrit.cloudera.org:8080/#/c/19511/3/fe/src/main/java/org/apache/impala/analysis/Expr.java File fe/src/main/java/org/apache/impala/analysis/Expr.java: http://gerrit.cloudera.org:8080/#/c/19511/3/fe/src/main/java/org/apache/impala/analysis/Expr.java@1265 PS3, Line 1265: for (C expr: origList) { The time complexity can be O(n^2) in the worst case, because every conjuncts would need to call removeDuplicates() if my understanding is correct, do you think it necessary to optimize it? http://gerrit.cloudera.org:8080/#/c/19511/3/testdata/workloads/functional-planner/queries/PlannerTest/joins.test File testdata/workloads/functional-planner/queries/PlannerTest/joins.test: http://gerrit.cloudera.org:8080/#/c/19511/3/testdata/workloads/functional-planner/queries/PlannerTest/joins.test@3122 PS3, Line 3122: ON c.c_custkey = l.l_orderkey and c.c_custkey = l.l_orderkey Tried below query, the "other predicates" should remove the duplicates, but seems not as expected. Query: explain SELECT c_custkey from tpch.customer c left outer join tpch.lineitem l ON c.c_custkey = l.l_orderkey where l.l_discount > c.c_acctbal and c.c_acctbal < l.l_discount +-----------------------------------------------------------------------------+ | Explain String | +-----------------------------------------------------------------------------+ | Max Per-Host Resource Reservation: Memory=30.50MB Threads=6 | | Per-Host Resource Estimates: Memory=819MB | | | | PLAN-ROOT SINK | | | | | 05:EXCHANGE [UNPARTITIONED] | | | | | 02:HASH JOIN [RIGHT OUTER JOIN, PARTITIONED] | | | hash predicates: l.l_orderkey = c.c_custkey | | | other predicates: c.c_acctbal < l.l_discount, l.l_discount > c.c_acctbal | | | runtime filters: RF000 <- c.c_custkey | | | row-size=32B cardinality=575.77K | | | | | |--04:EXCHANGE [HASH(c.c_custkey)] | | | | | | | 00:SCAN HDFS [tpch.customer c] | | | HDFS partitions=1/1 files=1 size=23.08MB | | | row-size=16B cardinality=150.00K | | | | | 03:EXCHANGE [HASH(l.l_orderkey)] | | | | | 01:SCAN HDFS [tpch.lineitem l] | | HDFS partitions=1/1 files=1 size=718.94MB | | runtime filters: RF000 -> l.l_orderkey | | row-size=16B cardinality=6.00M | +-----------------------------------------------------------------------------+ -- To view, visit http://gerrit.cloudera.org:8080/19511 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ia249c8146215fad602e9310bf922c6bfa050b96b Gerrit-Change-Number: 19511 Gerrit-PatchSet: 3 Gerrit-Owner: Baike Xia <[email protected]> Gerrit-Reviewer: Baike Xia <[email protected]> Gerrit-Reviewer: Impala Public Jenkins <[email protected]> Gerrit-Reviewer: Yida Wu <[email protected]> Gerrit-Comment-Date: Wed, 03 May 2023 21:09:01 +0000 Gerrit-HasComments: Yes
