Qifan Chen has posted comments on this change. ( http://gerrit.cloudera.org:8080/18023 )
Change subject: IMPALA-7942: Add query hints for cardinalities and selectivities ...................................................................... Patch Set 9: (23 comments) Great test section. I completed the review on the section and added some comments, most of them are minor. Seems this patch can provide another simple and powerful tool for people to better optimize queries. http://gerrit.cloudera.org:8080/#/c/18023/9/fe/src/test/java/org/apache/impala/analysis/AnalyzeStmtsTest.java File fe/src/test/java/org/apache/impala/analysis/AnalyzeStmtsTest.java: http://gerrit.cloudera.org:8080/#/c/18023/9/fe/src/test/java/org/apache/impala/analysis/AnalyzeStmtsTest.java@5053 PS9, Line 5053: "Syntax error in line 1" This error may not be user friendly for super long SQL query. Could we report the error specifically as follows? Table hint not recognized for <T>: a negative value (-1). http://gerrit.cloudera.org:8080/#/c/18023/9/fe/src/test/java/org/apache/impala/analysis/AnalyzeStmtsTest.java@5056 PS9, Line 5056: "Syntax error in line 1") same comment as above. http://gerrit.cloudera.org:8080/#/c/18023/9/fe/src/test/java/org/apache/impala/analysis/AnalyzeStmtsTest.java@5060 PS9, Line 5060: "Syntax error in line 1" same comment as above. http://gerrit.cloudera.org:8080/#/c/18023/9/fe/src/test/java/org/apache/impala/analysis/AnalyzeStmtsTest.java@5064 PS9, Line 5064: table nit. may add a qualifier "non-HDFS" before 'table' to make it clear. http://gerrit.cloudera.org:8080/#/c/18023/9/fe/src/test/java/org/apache/impala/analysis/AnalyzeStmtsTest.java@5071 PS9, Line 5071: "Syntax error in line 1" See the comment on TABLE_NUM_ROWS for "syntax error in line 1". http://gerrit.cloudera.org:8080/#/c/18023/9/fe/src/test/java/org/apache/impala/analysis/AnalyzeStmtsTest.java@5085 PS9, Line 5085: "Syntax error in line 1") is this a mistake? 0.1 is a perfect selectivity value. http://gerrit.cloudera.org:8080/#/c/18023/9/fe/src/test/java/org/apache/impala/analysis/AnalyzeStmtsTest.java@5087 PS9, Line 5087: Syntax error in line 1" Same as above. 0 is okay. http://gerrit.cloudera.org:8080/#/c/18023/9/fe/src/test/java/org/apache/impala/analysis/AnalyzeStmtsTest.java@5093 PS9, Line 5093: ( should be [0,1], right? http://gerrit.cloudera.org:8080/#/c/18023/9/fe/src/test/java/org/apache/impala/analysis/AnalyzeStmtsTest.java@5098 PS9, Line 5098: 0.0 See the comment about. Should be allowed. http://gerrit.cloudera.org:8080/#/c/18023/9/fe/src/test/java/org/apache/impala/analysis/AnalyzeStmtsTest.java@5102 PS9, Line 5102: l_shipdate <= (select '1998-09-02') " + : "/* +SELECTIVITY(0.5) >From the updated parser, it seems this hint should be accepted in that the >entire predicate is recognized first l_shipdate <= (select '1998-09-02'), >followed by the hint. http://gerrit.cloudera.org:8080/#/c/18023/9/testdata/workloads/functional-planner/queries/PlannerTest/hdfs-cardinality-hint.test File testdata/workloads/functional-planner/queries/PlannerTest/hdfs-cardinality-hint.test: http://gerrit.cloudera.org:8080/#/c/18023/9/testdata/workloads/functional-planner/queries/PlannerTest/hdfs-cardinality-hint.test@90 PS9, Line 90: DISTRIBUTEDPLAN same argument as for selectivity hints. We probably do not need to test PARALLEL and DISTRIBUTED plan. http://gerrit.cloudera.org:8080/#/c/18023/9/testdata/workloads/functional-planner/queries/PlannerTest/hdfs-cardinality-hint.test@261 PS9, Line 261: 100000 Maybe in the future we could allow short-hand notation such as 100K, 10G etc. http://gerrit.cloudera.org:8080/#/c/18023/9/testdata/workloads/functional-planner/queries/PlannerTest/predicate-selectivity-hint.test File testdata/workloads/functional-planner/queries/PlannerTest/predicate-selectivity-hint.test: http://gerrit.cloudera.org:8080/#/c/18023/9/testdata/workloads/functional-planner/queries/PlannerTest/predicate-selectivity-hint.test@2 PS9, Line 2: default value is 0.1 Please add a comment after it to indicate the cardinality clause in the scan node 00 shows the actual value after the selectivity of 0.1 is applied: 6M X 0.01 = 600.12K. http://gerrit.cloudera.org:8080/#/c/18023/9/testdata/workloads/functional-planner/queries/PlannerTest/predicate-selectivity-hint.test@70 PS9, Line 70: little nit. Since almost 98% values are less than '1998-09-02', we set http://gerrit.cloudera.org:8080/#/c/18023/9/testdata/workloads/functional-planner/queries/PlannerTest/predicate-selectivity-hint.test@95 PS9, Line 95: 5.88M nice. I agree with QiangLong that we only need the serial plan in this test as the distributed plan is generated well after the cardinality is determined. http://gerrit.cloudera.org:8080/#/c/18023/9/testdata/workloads/functional-planner/queries/PlannerTest/predicate-selectivity-hint.test@344 PS9, Line 344: There are no null value in 'l_shipdate' column, so this selectivity is 0 nit. Delete this sentence and add "Here we assume ...". http://gerrit.cloudera.org:8080/#/c/18023/9/testdata/workloads/functional-planner/queries/PlannerTest/predicate-selectivity-hint.test@481 PS9, Line 481: predicate nit. predicate's http://gerrit.cloudera.org:8080/#/c/18023/9/testdata/workloads/functional-planner/queries/PlannerTest/predicate-selectivity-hint.test@643 PS9, Line 643: 5.31M nice! http://gerrit.cloudera.org:8080/#/c/18023/9/testdata/workloads/functional-planner/queries/PlannerTest/predicate-selectivity-hint.test@754 PS9, Line 754: This predicate cannot compute selectivity, default value is 0.1 nit. Delete this sentence and add "Here we assume ...". http://gerrit.cloudera.org:8080/#/c/18023/9/testdata/workloads/functional-planner/queries/PlannerTest/predicate-selectivity-hint.test@780 PS9, Line 780: 3.57M Can you double check? It seems to me 3M should be the right number. See the test case for the following query select count(1) from tpch.lineitem where l_shipdate IS NULL /* +SELECTIVITY(0.5) */ at line 345. http://gerrit.cloudera.org:8080/#/c/18023/9/testdata/workloads/functional-planner/queries/PlannerTest/predicate-selectivity-hint.test@822 PS9, Line 822: A simple example for selectivity hint to change join mode, just for testing! nit. A simple example to show that selectivity hint can help change join mode and be used as an optimization tool. http://gerrit.cloudera.org:8080/#/c/18023/9/testdata/workloads/functional-planner/queries/PlannerTest/predicate-selectivity-hint.test@826 PS9, Line 826: Default value is 0.1, nit. Impala can not compute selectivity and assumes the default value of 0.1. If we assume the actual ... http://gerrit.cloudera.org:8080/#/c/18023/9/testdata/workloads/functional-planner/queries/PlannerTest/predicate-selectivity-hint.test@827 PS9, Line 827: will be nit. becomes -- To view, visit http://gerrit.cloudera.org:8080/18023 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I2776b9bbd878b8a21d9c866b400140a454f59e1b Gerrit-Change-Number: 18023 Gerrit-PatchSet: 9 Gerrit-Owner: wangsheng <[email protected]> Gerrit-Reviewer: Amogh Margoor <[email protected]> Gerrit-Reviewer: Fucun Chu <[email protected]> Gerrit-Reviewer: Impala Public Jenkins <[email protected]> Gerrit-Reviewer: Qifan Chen <[email protected]> Gerrit-Reviewer: Quanlong Huang <[email protected]> Gerrit-Reviewer: Zoltan Borok-Nagy <[email protected]> Gerrit-Reviewer: wangsheng <[email protected]> Gerrit-Comment-Date: Wed, 07 Sep 2022 17:35:33 +0000 Gerrit-HasComments: Yes
