Csaba Ringhofer has posted comments on this change. ( http://gerrit.cloudera.org:8080/18574 )
Change subject: IMPALA-11279: Optimize plain count(*) queries for Iceberg tables ...................................................................... Patch Set 3: (2 comments) http://gerrit.cloudera.org:8080/#/c/18574/3//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/18574/3//COMMIT_MSG@17 PS3, Line 17: - SelectStmt does not have WHERE clause > Partitioning might use partition transforms, or maybe the table went throug Ok, I agree with not dealing with this in the current review. http://gerrit.cloudera.org:8080/#/c/18574/3//COMMIT_MSG@22 PS3, Line 22: - SelectList contains only 'count(*)' > Good suggestions, thanks for your codereview, I would like to create a foll tl;dr: the current solution is ok for me if you want to push this optimization quickly, but I would prefer a different approach While I agree with focusing on the simple select count(*) in this review, I think that it would be hard to extend the current implementation to be more generic. Specifically I think that it would be much better to create a plan in SingleNodePlanner that returns a constant instead of adding a special case in ClientRequestState::ExecQueryOrDmlRequest(). Some hints on how this could be done: - we already have a special case where we return a const plan based on metadata - the Node that can be used in Impala to return a const is Union. See this block: https://github.com/apache/impala/blob/c41e6941cafad453819b57e78b2083a3e64496e0/fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java#L1621 What happens there is that the function is expected to return a HdfsScanNode, but if the result can be computed from metadata, we get the result and create a UnionNode that returns this result. - ScanNodes for Iceberg tables are created in this block: https://github.com/apache/impala/blob/c41e6941cafad453819b57e78b2083a3e64496e0/fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java#L1873 There is also a function that can help to decide whether count star optimization is applicable (so there is no JOIN, WHERE, etc ... in the query): https://github.com/apache/impala/blob/c41e6941cafad453819b57e78b2083a3e64496e0/fe/src/main/java/org/apache/impala/planner/ScanNode.java#L156 Where this differs from what is needed in this patch is that it also allows queries that use partitioning columns (e.g. in WHERE), which is expressed by || || desc_.hasClusteringColsOnly() The tricky part in the function is !desc_.hasMaterializedSlots() , which means the no column is used from the table in the select list or WHERE, and this is only possible in case of constants and count( * ). -- To view, visit http://gerrit.cloudera.org:8080/18574 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I8e9c48bbba7ab2320fa80915e7001ce54f1ef6d9 Gerrit-Change-Number: 18574 Gerrit-PatchSet: 3 Gerrit-Owner: Anonymous Coward <lipeng...@sensorsdata.cn> Gerrit-Reviewer: Anonymous Coward <lipeng...@sensorsdata.cn> Gerrit-Reviewer: Csaba Ringhofer <csringho...@cloudera.com> Gerrit-Reviewer: Gergely Fürnstáhl <gfurnst...@cloudera.com> Gerrit-Reviewer: Impala Public Jenkins <impala-public-jenk...@cloudera.com> Gerrit-Reviewer: Jian Zhang <zjsar...@gmail.com> Gerrit-Reviewer: Tamas Mate <tma...@apache.org> Gerrit-Reviewer: Xianqing He <hexianqing...@126.com> Gerrit-Reviewer: Zoltan Borok-Nagy <borokna...@cloudera.com> Gerrit-Comment-Date: Thu, 02 Jun 2022 17:36:09 +0000 Gerrit-HasComments: Yes