Quanlong Huang has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/18829 )

Change subject: IMPALA-7942 (part 1): Add query hints for table cardinalities
......................................................................


Patch Set 3:

(13 comments)

Thanks for spliting the patch! It's much easier to review and iterate.

http://gerrit.cloudera.org:8080/#/c/18829/3//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/18829/3//COMMIT_MSG@10
PS3, Line 10: query plan generate
nit: generation? or "query planning", "query optimization"


http://gerrit.cloudera.org:8080/#/c/18829/3//COMMIT_MSG@19
PS3, Line 19: , even if
            : table has no stats.
nit: regardless the existense of the stats.


http://gerrit.cloudera.org:8080/#/c/18829/3/fe/src/main/java/org/apache/impala/analysis/TableRef.java
File fe/src/main/java/org/apache/impala/analysis/TableRef.java:

http://gerrit.cloudera.org:8080/#/c/18829/3/fe/src/main/java/org/apache/impala/analysis/TableRef.java@173
PS3, Line 173:   // Reserve 'TABLE_NUM_ROWS' hint value from query, we will use 
this value when construct
             :   // HdfsScanNode if table does not have stats, or corrupt 
stats. Otherwise, Impala will
             :   // use table original stats.
nit: might be better to reword to

Value of query hint 'TABLE_NUM_ROWS' on this table. Used in constructing 
HdfsScanNode if the table does not have stats, or has correct stats. -1 
indicates no hint.


http://gerrit.cloudera.org:8080/#/c/18829/3/fe/src/main/java/org/apache/impala/analysis/TableRef.java@510
PS3, Line 510: isTableHintSupport
nit: isTableHintSupported


http://gerrit.cloudera.org:8080/#/c/18829/3/fe/src/main/java/org/apache/impala/analysis/TableRef.java@514
PS3, Line 514: Table hints supported for Hdfs/Kudu tables now
nit: reword to

Table hints only supported for Hdfs/Kudu tables.


http://gerrit.cloudera.org:8080/#/c/18829/3/fe/src/main/java/org/apache/impala/analysis/TableRef.java@518
PS3, Line 518:       if (hint.is("SCHEDULE_CACHE_LOCAL")) {
Does this mean we support such hints for Kudu tables now? I think the 
SCHEDULE_* hints only apply on Hdfs tables.

I think what we expect is only the TABLE_NUM_ROWS hint being used on both hdfs 
and kudu tables. The other hints should remain to be only used on hdfs tables. 
Although I think we should have scheduler hints for kudu tables as well, we can 
support them in another JIRA.


http://gerrit.cloudera.org:8080/#/c/18829/3/fe/src/main/java/org/apache/impala/analysis/TableRef.java@555
PS3, Line 555: // This hint is valid for Hdfs/Kudu table now
nit: can we remove this comment? It seems no need to explain the following line.


http://gerrit.cloudera.org:8080/#/c/18829/3/fe/src/main/java/org/apache/impala/analysis/TableRef.java@556
PS3, Line 556: Long.valueOf
nit: can use Long.parseLong() directly, which is used internally in 
Long.valueOf().


http://gerrit.cloudera.org:8080/#/c/18829/3/fe/src/main/java/org/apache/impala/analysis/TableRef.java@564
PS3, Line 564: Support table row hint for hdfs and kudu table now.
nit: reword to

Returns whether the table supports table hint. Currently, only hdfs and kudu 
tables support it.


http://gerrit.cloudera.org:8080/#/c/18829/3/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java
File fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java:

http://gerrit.cloudera.org:8080/#/c/18829/3/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java@1465
PS3, Line 1465:       cardinality_ = extrapolatedNumRows_;
Should we overwrite this as well if the hint exists?


http://gerrit.cloudera.org:8080/#/c/18829/3/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java@1542
PS3, Line 1542:    * partitions with corrupt stats.
Could you please mention the hint in this comment?


http://gerrit.cloudera.org:8080/#/c/18829/3/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java@1573
PS3, Line 1573:       return tableNumRowsHint_ >= 0 ? tableNumRowsHint_ : 
partitionNumRows_;
I thought we only use the hint when missing stats. This always overwrites the 
original stats which is ok but conflict with the commit message and some 
comments. Please update them appropriately.

BTW, should we update partitionNumRows_ to be tableNumRowsHint_ if it's valid? 
Not sue if inconsistency in these two values will cause troubles.

UPDATE: I see partitionNumRows_ is only used in getTableStatsExplainString() 
outside this method. It's shown in the query plan. It'd be better to update it 
as tableNumRowsHint_ to avoid confusion.


http://gerrit.cloudera.org:8080/#/c/18829/3/fe/src/main/java/org/apache/impala/planner/ScanNode.java
File fe/src/main/java/org/apache/impala/planner/ScanNode.java:

http://gerrit.cloudera.org:8080/#/c/18829/3/fe/src/main/java/org/apache/impala/planner/ScanNode.java@81
PS3, Line 81:   // Reserve 'TABLE_NUM_ROWS' hint value from query, we will use 
this value when construct
            :   // ScanNode if table does not have stats, or corrupt stats. 
Otherwise, Impala will
            :   // use table original stats.
            :   // Only valid for hdfs table currently.
nit: maybe we can just refer to the comment of TableRef.tableNumRowsHint_



--
To view, visit http://gerrit.cloudera.org:8080/18829
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I9f0c773f4e67782a1428db64062f68afbd257af7
Gerrit-Change-Number: 18829
Gerrit-PatchSet: 3
Gerrit-Owner: wangsheng <[email protected]>
Gerrit-Reviewer: Aman Sinha <[email protected]>
Gerrit-Reviewer: Fucun Chu <[email protected]>
Gerrit-Reviewer: Impala Public Jenkins <[email protected]>
Gerrit-Reviewer: Quanlong Huang <[email protected]>
Gerrit-Comment-Date: Tue, 16 Aug 2022 08:01:23 +0000
Gerrit-HasComments: Yes

Reply via email to