Qifan Chen has posted comments on this change. ( http://gerrit.cloudera.org:8080/18829 )
Change subject: IMPALA-7942 (part 1): Add query hints for table cardinalities ...................................................................... Patch Set 5: (5 comments) Looks good! Complete the code section and have a few comments. I have not got a chance to review the test section. If it is similar to the one in the parent JIRA of this JIRA, would it be possible to address any concerns there in this JIRA? http://gerrit.cloudera.org:8080/#/c/18829/5//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/18829/5//COMMIT_MSG@9 PS5, Line 9: Currently, We need execute 'COMPUTE STATS' manually to compute : table stats info. Stats is very useful for query planning. : Without these stats, query plan maybe worse. In order to solve : this probelm, this patch adds a new query hint: 'TABLE_NUM_ROWS', nit. Rewording. Currently, we run 'COMPUTE STATS' command to compute table stats which is very useful for query planning. Without these stats, a query plan may not be optimal. However, these stats may not be available, up to date, or valid. To workaround this problem, this patch adds a new query hint: ...... http://gerrit.cloudera.org:8080/#/c/18829/5/fe/src/main/java/org/apache/impala/analysis/TableRef.java File fe/src/main/java/org/apache/impala/analysis/TableRef.java: http://gerrit.cloudera.org:8080/#/c/18829/5/fe/src/main/java/org/apache/impala/analysis/TableRef.java@176 PS5, Line 176: tableNumRowsHint_ This new data member in class TableRef implies that two table ref objects may be assigned with different num rows, even when both refer to the same SQL table. select * from T /* table_num_row(100) */, T /* table_num_row(1000) */; It sounds like we need to have a step to verify this and raise an error. http://gerrit.cloudera.org:8080/#/c/18829/5/fe/src/main/java/org/apache/impala/analysis/TableRef.java@569 PS5, Line 569: supportTableHint It is a good idea to specify the exact hint in question: supportTableRowHint(). http://gerrit.cloudera.org:8080/#/c/18829/5/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java File fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java: http://gerrit.cloudera.org:8080/#/c/18829/5/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java@1547 PS5, Line 1547: * Currently, we provide a table hint 'TABLE_NUM_ROWS' to set table rows manually if : * table has no stats or has corrupt stats. If the table has valid stats, this hint : * will be ignored. In my past experience with very large data warehouse, that the table num row hint can be applied to even a table with valid stats is very useful to judge plans under different table growth conditions. For this patch, I feel it is okey to support it only for missing or non valid stats first, and extend it for valid stats later on. http://gerrit.cloudera.org:8080/#/c/18829/5/fe/src/main/java/org/apache/impala/planner/KuduScanNode.java File fe/src/main/java/org/apache/impala/planner/KuduScanNode.java: http://gerrit.cloudera.org:8080/#/c/18829/5/fe/src/main/java/org/apache/impala/planner/KuduScanNode.java@379 PS5, Line 379: no nit. has no stats -- To view, visit http://gerrit.cloudera.org:8080/18829 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I9f0c773f4e67782a1428db64062f68afbd257af7 Gerrit-Change-Number: 18829 Gerrit-PatchSet: 5 Gerrit-Owner: wangsheng <[email protected]> Gerrit-Reviewer: Aman Sinha <[email protected]> Gerrit-Reviewer: Fang-Yu Rao <[email protected]> Gerrit-Reviewer: Fucun Chu <[email protected]> Gerrit-Reviewer: Impala Public Jenkins <[email protected]> Gerrit-Reviewer: Qifan Chen <[email protected]> Gerrit-Reviewer: Quanlong Huang <[email protected]> Gerrit-Reviewer: wangsheng <[email protected]> Gerrit-Comment-Date: Thu, 08 Sep 2022 17:41:19 +0000 Gerrit-HasComments: Yes
