wangsheng has posted comments on this change. ( http://gerrit.cloudera.org:8080/18829 )
Change subject: IMPALA-7942 (part 1): Add query hints for table cardinalities ...................................................................... Patch Set 5: (4 comments) > (5 comments) > > Looks good! > > Complete the code section and have a few comments. > > I have not got a chance to review the test section. If it is > similar to the one in the parent JIRA of this JIRA, would it be > possible to address any concerns there in this JIRA? Hi Qifan, thanks for continued interest in this feature. This patch is split from https://gerrit.cloudera.org/#/c/18023. We will focus on table cardinality hint in this patch, and reduce previous patch to focus on selectivity hint. I've already minimize test cases in this patch. Remove unnecessary plan to keeping test file small. Such as PARALLEL and DISTRIBUTED plan, not exists in test file. Only 61 lines currently. If you are free, you can read it quickly and simply. http://gerrit.cloudera.org:8080/#/c/18829/5//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/18829/5//COMMIT_MSG@9 PS5, Line 9: Currently, We need execute 'COMPUTE STATS' manually to compute : table stats info. Stats is very useful for query planning. : Without these stats, query plan maybe worse. In order to solve : this probelm, this patch adds a new query hint: 'TABLE_NUM_ROWS', > nit. Rewording. Done By the way, 'valid' -> 'invalid'? http://gerrit.cloudera.org:8080/#/c/18829/5/fe/src/main/java/org/apache/impala/analysis/TableRef.java File fe/src/main/java/org/apache/impala/analysis/TableRef.java: http://gerrit.cloudera.org:8080/#/c/18829/5/fe/src/main/java/org/apache/impala/analysis/TableRef.java@176 PS5, Line 176: tableNumRowsHint_ > This new data member in class TableRef implies that two table ref objects m Do you mean different 'table_num_row' value for same table in sql will cause different cardinality is a problem? We should raise error for this situation? Such as: select count(1) from functional_parquet.alltypes t1/* +TABLE_NUM_ROWS(1000) */,functional_parquet.alltypes t2/* +TABLE_NUM_ROWS(2000) */ where t1.id = t2.id; http://gerrit.cloudera.org:8080/#/c/18829/5/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java File fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java: http://gerrit.cloudera.org:8080/#/c/18829/5/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java@1547 PS5, Line 1547: * Currently, we provide a table hint 'TABLE_NUM_ROWS' to set table rows manually if : * table has no stats or has corrupt stats. If the table has valid stats, this hint : * will be ignored. > In my past experience with very large data warehouse, that the table num ro Thanks for adivce, Qifan. I agree that we can extend for valid stats later on if necessary. http://gerrit.cloudera.org:8080/#/c/18829/5/fe/src/main/java/org/apache/impala/planner/KuduScanNode.java File fe/src/main/java/org/apache/impala/planner/KuduScanNode.java: http://gerrit.cloudera.org:8080/#/c/18829/5/fe/src/main/java/org/apache/impala/planner/KuduScanNode.java@379 PS5, Line 379: no > nit. has no stats Done -- To view, visit http://gerrit.cloudera.org:8080/18829 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I9f0c773f4e67782a1428db64062f68afbd257af7 Gerrit-Change-Number: 18829 Gerrit-PatchSet: 5 Gerrit-Owner: wangsheng <[email protected]> Gerrit-Reviewer: Aman Sinha <[email protected]> Gerrit-Reviewer: Fang-Yu Rao <[email protected]> Gerrit-Reviewer: Fucun Chu <[email protected]> Gerrit-Reviewer: Impala Public Jenkins <[email protected]> Gerrit-Reviewer: Qifan Chen <[email protected]> Gerrit-Reviewer: Quanlong Huang <[email protected]> Gerrit-Reviewer: wangsheng <[email protected]> Gerrit-Comment-Date: Tue, 13 Sep 2022 13:00:47 +0000 Gerrit-HasComments: Yes
