Quanlong Huang has posted comments on this change. ( http://gerrit.cloudera.org:8080/18829 )
Change subject: IMPALA-7942 (part 1): Add query hints for table cardinalities ...................................................................... Patch Set 3: (13 comments) Thanks for spliting the patch! It's much easier to review and iterate. http://gerrit.cloudera.org:8080/#/c/18829/3//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/18829/3//COMMIT_MSG@10 PS3, Line 10: query plan generate nit: generation? or "query planning", "query optimization" http://gerrit.cloudera.org:8080/#/c/18829/3//COMMIT_MSG@19 PS3, Line 19: , even if : table has no stats. nit: regardless the existense of the stats. http://gerrit.cloudera.org:8080/#/c/18829/3/fe/src/main/java/org/apache/impala/analysis/TableRef.java File fe/src/main/java/org/apache/impala/analysis/TableRef.java: http://gerrit.cloudera.org:8080/#/c/18829/3/fe/src/main/java/org/apache/impala/analysis/TableRef.java@173 PS3, Line 173: // Reserve 'TABLE_NUM_ROWS' hint value from query, we will use this value when construct : // HdfsScanNode if table does not have stats, or corrupt stats. Otherwise, Impala will : // use table original stats. nit: might be better to reword to Value of query hint 'TABLE_NUM_ROWS' on this table. Used in constructing HdfsScanNode if the table does not have stats, or has correct stats. -1 indicates no hint. http://gerrit.cloudera.org:8080/#/c/18829/3/fe/src/main/java/org/apache/impala/analysis/TableRef.java@510 PS3, Line 510: isTableHintSupport nit: isTableHintSupported http://gerrit.cloudera.org:8080/#/c/18829/3/fe/src/main/java/org/apache/impala/analysis/TableRef.java@514 PS3, Line 514: Table hints supported for Hdfs/Kudu tables now nit: reword to Table hints only supported for Hdfs/Kudu tables. http://gerrit.cloudera.org:8080/#/c/18829/3/fe/src/main/java/org/apache/impala/analysis/TableRef.java@518 PS3, Line 518: if (hint.is("SCHEDULE_CACHE_LOCAL")) { Does this mean we support such hints for Kudu tables now? I think the SCHEDULE_* hints only apply on Hdfs tables. I think what we expect is only the TABLE_NUM_ROWS hint being used on both hdfs and kudu tables. The other hints should remain to be only used on hdfs tables. Although I think we should have scheduler hints for kudu tables as well, we can support them in another JIRA. http://gerrit.cloudera.org:8080/#/c/18829/3/fe/src/main/java/org/apache/impala/analysis/TableRef.java@555 PS3, Line 555: // This hint is valid for Hdfs/Kudu table now nit: can we remove this comment? It seems no need to explain the following line. http://gerrit.cloudera.org:8080/#/c/18829/3/fe/src/main/java/org/apache/impala/analysis/TableRef.java@556 PS3, Line 556: Long.valueOf nit: can use Long.parseLong() directly, which is used internally in Long.valueOf(). http://gerrit.cloudera.org:8080/#/c/18829/3/fe/src/main/java/org/apache/impala/analysis/TableRef.java@564 PS3, Line 564: Support table row hint for hdfs and kudu table now. nit: reword to Returns whether the table supports table hint. Currently, only hdfs and kudu tables support it. http://gerrit.cloudera.org:8080/#/c/18829/3/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java File fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java: http://gerrit.cloudera.org:8080/#/c/18829/3/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java@1465 PS3, Line 1465: cardinality_ = extrapolatedNumRows_; Should we overwrite this as well if the hint exists? http://gerrit.cloudera.org:8080/#/c/18829/3/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java@1542 PS3, Line 1542: * partitions with corrupt stats. Could you please mention the hint in this comment? http://gerrit.cloudera.org:8080/#/c/18829/3/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java@1573 PS3, Line 1573: return tableNumRowsHint_ >= 0 ? tableNumRowsHint_ : partitionNumRows_; I thought we only use the hint when missing stats. This always overwrites the original stats which is ok but conflict with the commit message and some comments. Please update them appropriately. BTW, should we update partitionNumRows_ to be tableNumRowsHint_ if it's valid? Not sue if inconsistency in these two values will cause troubles. UPDATE: I see partitionNumRows_ is only used in getTableStatsExplainString() outside this method. It's shown in the query plan. It'd be better to update it as tableNumRowsHint_ to avoid confusion. http://gerrit.cloudera.org:8080/#/c/18829/3/fe/src/main/java/org/apache/impala/planner/ScanNode.java File fe/src/main/java/org/apache/impala/planner/ScanNode.java: http://gerrit.cloudera.org:8080/#/c/18829/3/fe/src/main/java/org/apache/impala/planner/ScanNode.java@81 PS3, Line 81: // Reserve 'TABLE_NUM_ROWS' hint value from query, we will use this value when construct : // ScanNode if table does not have stats, or corrupt stats. Otherwise, Impala will : // use table original stats. : // Only valid for hdfs table currently. nit: maybe we can just refer to the comment of TableRef.tableNumRowsHint_ -- To view, visit http://gerrit.cloudera.org:8080/18829 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I9f0c773f4e67782a1428db64062f68afbd257af7 Gerrit-Change-Number: 18829 Gerrit-PatchSet: 3 Gerrit-Owner: wangsheng <[email protected]> Gerrit-Reviewer: Aman Sinha <[email protected]> Gerrit-Reviewer: Fucun Chu <[email protected]> Gerrit-Reviewer: Impala Public Jenkins <[email protected]> Gerrit-Reviewer: Quanlong Huang <[email protected]> Gerrit-Comment-Date: Tue, 16 Aug 2022 08:01:23 +0000 Gerrit-HasComments: Yes
