Csaba Ringhofer has posted comments on this change. ( http://gerrit.cloudera.org:8080/18574 )
Change subject: IMPALA-11279: Optimize plain count(*) queries for Iceberg tables ...................................................................... Patch Set 12: (5 comments) http://gerrit.cloudera.org:8080/#/c/18574/3//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/18574/3//COMMIT_MSG@22 PS3, Line 22: - SelectList contains only 'count(*)' > Hi Ringhofer, I tried generic implementation. Thanks a lot for the change! I think that approach is correct, though I have some comments about which classes should implement this logic. http://gerrit.cloudera.org:8080/#/c/18574/12/fe/src/main/java/org/apache/impala/analysis/AnalysisContext.java File fe/src/main/java/org/apache/impala/analysis/AnalysisContext.java: http://gerrit.cloudera.org:8080/#/c/18574/12/fe/src/main/java/org/apache/impala/analysis/AnalysisContext.java@552 PS12, Line 552: optimizePlainCountStarQuery I think that the right place to call this is in SelectAnalyzer.analyze() in SelectStmt.java. So the number of rows would be always set for a select block if it is possible to apply the optimization, and later the expr rewriter could replace count(*) calls. http://gerrit.cloudera.org:8080/#/c/18574/12/fe/src/main/java/org/apache/impala/analysis/FunctionCallExpr.java File fe/src/main/java/org/apache/impala/analysis/FunctionCallExpr.java: http://gerrit.cloudera.org:8080/#/c/18574/12/fe/src/main/java/org/apache/impala/analysis/FunctionCallExpr.java@59 PS12, Line 59: // Total records num of the Iceberg table : private long totalRecordsNum_; As this is a global value for the whole select block, I think that this should go to Analyzer - it is always available during analyses, so it will be easier to get this info during expression rewrite. (subqueries / views have their own Analyzer). http://gerrit.cloudera.org:8080/#/c/18574/12/fe/src/main/java/org/apache/impala/analysis/SelectStmt.java File fe/src/main/java/org/apache/impala/analysis/SelectStmt.java: http://gerrit.cloudera.org:8080/#/c/18574/12/fe/src/main/java/org/apache/impala/analysis/SelectStmt.java@1347 PS12, Line 1347: List<SelectListItem> selectItems = this.getSelectList().getItems(); : if (selectItems.size() != 1) return; : Expr expr = selectItems.get(0).getExpr(); : if (!(expr instanceof FunctionCallExpr)) return; : FunctionCallExpr funcCallExpr = (FunctionCallExpr) expr; : if (!funcCallExpr.getFnName().getFunction().equals("count")) return; : if (!funcCallExpr.getParams().isStar()) { : if (funcCallExpr.getParams().isDistinct()) return; : if (funcCallExpr.getParams().exprs().size() != 1) return; : Expr child = funcCallExpr.getChild(0); : if (!Expr.IS_LITERAL.apply(child)) return; : if (Expr.IS_NULL_VALUE.apply(child)) return; : } This is not strictly necessary - count(*) can be replaced with a literal even if there are other items in the select list. For example SELECT count(*), max(some_col) FROM t; can still profit a bit from not having to count rows. http://gerrit.cloudera.org:8080/#/c/18574/12/fe/src/main/java/org/apache/impala/analysis/SelectStmt.java@1365 PS12, Line 1365: funcCallExpr.setTotalRecordsNum(num); As I wrote in FunctionCallExpr, I think that it would be better to save this value in analyzer_. -- To view, visit http://gerrit.cloudera.org:8080/18574 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I8e9c48bbba7ab2320fa80915e7001ce54f1ef6d9 Gerrit-Change-Number: 18574 Gerrit-PatchSet: 12 Gerrit-Owner: Anonymous Coward <lipeng...@sensorsdata.cn> Gerrit-Reviewer: Anonymous Coward <lipeng...@sensorsdata.cn> Gerrit-Reviewer: Csaba Ringhofer <csringho...@cloudera.com> Gerrit-Reviewer: Gergely Fürnstáhl <gfurnst...@cloudera.com> Gerrit-Reviewer: Impala Public Jenkins <impala-public-jenk...@cloudera.com> Gerrit-Reviewer: Jian Zhang <zjsar...@gmail.com> Gerrit-Reviewer: Tamas Mate <tma...@apache.org> Gerrit-Reviewer: Xianqing He <hexianqing...@126.com> Gerrit-Reviewer: Zoltan Borok-Nagy <borokna...@cloudera.com> Gerrit-Comment-Date: Wed, 15 Jun 2022 16:31:40 +0000 Gerrit-HasComments: Yes