Csaba Ringhofer has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/18574 )

Change subject: IMPALA-11279: Optimize plain count(*) queries for Iceberg tables
......................................................................


Patch Set 12:

(5 comments)

http://gerrit.cloudera.org:8080/#/c/18574/3//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/18574/3//COMMIT_MSG@22
PS3, Line 22:  - SelectList contains only 'count(*)'
> Hi Ringhofer, I tried generic implementation.
Thanks a lot for the change! I think that approach is correct, though I have 
some comments about which classes should implement this logic.


http://gerrit.cloudera.org:8080/#/c/18574/12/fe/src/main/java/org/apache/impala/analysis/AnalysisContext.java
File fe/src/main/java/org/apache/impala/analysis/AnalysisContext.java:

http://gerrit.cloudera.org:8080/#/c/18574/12/fe/src/main/java/org/apache/impala/analysis/AnalysisContext.java@552
PS12, Line 552: optimizePlainCountStarQuery
I think that the right place to call this is in SelectAnalyzer.analyze() in 
SelectStmt.java. So the number of rows would be always set for a select block 
if it is possible to apply the optimization, and later the expr rewriter could 
replace count(*) calls.


http://gerrit.cloudera.org:8080/#/c/18574/12/fe/src/main/java/org/apache/impala/analysis/FunctionCallExpr.java
File fe/src/main/java/org/apache/impala/analysis/FunctionCallExpr.java:

http://gerrit.cloudera.org:8080/#/c/18574/12/fe/src/main/java/org/apache/impala/analysis/FunctionCallExpr.java@59
PS12, Line 59:   // Total records num of the Iceberg table
             :   private long totalRecordsNum_;
As this is a global value for the whole select block, I think that this should 
go to Analyzer - it is always available during analyses, so it will be easier 
to get this info during expression rewrite. (subqueries / views have their own 
Analyzer).


http://gerrit.cloudera.org:8080/#/c/18574/12/fe/src/main/java/org/apache/impala/analysis/SelectStmt.java
File fe/src/main/java/org/apache/impala/analysis/SelectStmt.java:

http://gerrit.cloudera.org:8080/#/c/18574/12/fe/src/main/java/org/apache/impala/analysis/SelectStmt.java@1347
PS12, Line 1347:     List<SelectListItem> selectItems = 
this.getSelectList().getItems();
               :     if (selectItems.size() != 1) return;
               :     Expr expr = selectItems.get(0).getExpr();
               :     if (!(expr instanceof FunctionCallExpr)) return;
               :     FunctionCallExpr funcCallExpr = (FunctionCallExpr) expr;
               :     if 
(!funcCallExpr.getFnName().getFunction().equals("count")) return;
               :     if (!funcCallExpr.getParams().isStar()) {
               :       if (funcCallExpr.getParams().isDistinct()) return;
               :       if (funcCallExpr.getParams().exprs().size() != 1) return;
               :       Expr child = funcCallExpr.getChild(0);
               :       if (!Expr.IS_LITERAL.apply(child)) return;
               :       if (Expr.IS_NULL_VALUE.apply(child)) return;
               :     }
This is not strictly necessary - count(*) can be replaced with a literal even 
if there are other items in the select list. For example SELECT count(*), 
max(some_col) FROM t; can still profit a bit from not having to count rows.


http://gerrit.cloudera.org:8080/#/c/18574/12/fe/src/main/java/org/apache/impala/analysis/SelectStmt.java@1365
PS12, Line 1365:       funcCallExpr.setTotalRecordsNum(num);
As I wrote in FunctionCallExpr, I think that it would be better to save this 
value in analyzer_.



--
To view, visit http://gerrit.cloudera.org:8080/18574
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I8e9c48bbba7ab2320fa80915e7001ce54f1ef6d9
Gerrit-Change-Number: 18574
Gerrit-PatchSet: 12
Gerrit-Owner: Anonymous Coward <lipeng...@sensorsdata.cn>
Gerrit-Reviewer: Anonymous Coward <lipeng...@sensorsdata.cn>
Gerrit-Reviewer: Csaba Ringhofer <csringho...@cloudera.com>
Gerrit-Reviewer: Gergely Fürnstáhl <gfurnst...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <impala-public-jenk...@cloudera.com>
Gerrit-Reviewer: Jian Zhang <zjsar...@gmail.com>
Gerrit-Reviewer: Tamas Mate <tma...@apache.org>
Gerrit-Reviewer: Xianqing He <hexianqing...@126.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <borokna...@cloudera.com>
Gerrit-Comment-Date: Wed, 15 Jun 2022 16:31:40 +0000
Gerrit-HasComments: Yes

Reply via email to