Joe McDonnell has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/21109 )

Change subject: IMPALA-12872: Use Calcite for optimization - part 1: simple 
queries
......................................................................


Patch Set 18:

(7 comments)

This is looking good, but I have a couple points.
First, if it doesn't have a test, then it doesn't exist. Maybe that's a bit 
strict for something this early, but it is the only thing that I know works. If 
we accumulate code without tests, it's a pain to go back and fix that. Along 
that vein, it would be good to have tests for the cases that bailout early from 
the Calcite planner. Some examples:
1. Testing semi join / anti join detection
2. Testing with decimal v1 / appx count distinct
3. Testing with Kudu/HBase table
4. Testing with complex types
5. Testing with a view
6. The logic in canStmtBePlannedThroughCalcite()
Some of these will go away over time, and that's ok.

Second, we have the first piece of fallback, but fallback is going to need more 
work. We're going to want to have reasonable information in the profiles and 
we'll want test cases. We are probably going to need to find a way to merge the 
timelines / account for time before Calcite falls back. Storing the error 
message that caused fallback into the profile would be useful. All of this is a 
long way of saying: I would be ok if this change didn't have fallback.

http://gerrit.cloudera.org:8080/#/c/21109/18/bin/set-classpath.sh
File bin/set-classpath.sh:

http://gerrit.cloudera.org:8080/#/c/21109/18/bin/set-classpath.sh@41
PS18, Line 41:
Nit: stray line


http://gerrit.cloudera.org:8080/#/c/21109/18/bin/set-classpath.sh@56
PS18, Line 56: echo "USE_CALCITE_PLANNER"
             : echo $USE_CALCITE_PLANNER
Can we skip this debug statement?


http://gerrit.cloudera.org:8080/#/c/21109/18/java/calcite-planner/src/main/java/org/apache/impala/calcite/service/CalciteJniFrontend.java
File 
java/calcite-planner/src/main/java/org/apache/impala/calcite/service/CalciteJniFrontend.java:

http://gerrit.cloudera.org:8080/#/c/21109/18/java/calcite-planner/src/main/java/org/apache/impala/calcite/service/CalciteJniFrontend.java@144
PS18, Line 144:  catch (Exception e) {
              :       LOG.info("Calcite planner failed.");
              :       LOG.info("Exception: " + e);
              :       if (e != null) {
              :         LOG.info("Stack Trace:" + 
ExceptionUtils.getStackTrace(e));
              :         throw new InternalException(e.getMessage());
              :       }
              :       throw new RuntimeException(e);
              :     }
For complete fallback support, this case would need to fall back. In practice, 
most exceptions are not CalcitePlannerUnsupportedExceptions. e.g. "select 
count(*) from functional.alltypes" doesn't fall back.

There are a variety of test cases that I would want to see for fallback 
functionality (e.g. what does the profile show for fallback, what things 
fallback instantly versus fallback as a result of a Calcite error, etc). Should 
we skip fallback in this initial change?


http://gerrit.cloudera.org:8080/#/c/21109/18/java/calcite-planner/src/main/java/org/apache/impala/calcite/service/CalciteJniFrontend.java@159
PS18, Line 159:   private boolean canStmtBePlannedThroughCalcite(QueryContext 
queryCtx) {
              :     String stringWithFirstRealWord = queryCtx.getStmt();
              :     String[] lines = stringWithFirstRealWord.split("\n");
              :     // Get rid of comments and blank lines which start the 
query. We need to find
              :     // the first real word.
              :     // TODO: IMPALA-12976: need to make this more generic. 
Certain patterns aren't caught
              :     // here like /* */
              :     for (String line : lines) {
              :       if (line.trim().startsWith("--") || 
line.trim().equals("")) {
              :         stringWithFirstRealWord = 
stringWithFirstRealWord.replaceFirst(line + "\n", "");
              :       } else {
              :         break;
              :       }
              :     }
              :     stringWithFirstRealWord = stringWithFirstRealWord.trim();
              :     String beforeStripString;
              :     do {
              :       beforeStripString = stringWithFirstRealWord;
              :       stringWithFirstRealWord = 
StringUtils.stripStart(stringWithFirstRealWord, "(");
              :       stringWithFirstRealWord = 
StringUtils.stripStart(stringWithFirstRealWord, null);
              :     } while 
(!stringWithFirstRealWord.equals(beforeStripString));
              :     return 
StringUtils.startsWithIgnoreCase(stringWithFirstRealWord, "select") ||
              :         
StringUtils.startsWithIgnoreCase(stringWithFirstRealWord, "values") ||
              :         
StringUtils.startsWithIgnoreCase(stringWithFirstRealWord, "with");
              :   }
              :
              :   private void checkUnsupportedFeatures(QueryContext queryCtx)
              :       throws CalcitePlannerUnsupportedException {
              :     if (SEMI_JOIN.matcher(queryCtx.getStmt()).find()) {
              :       throw new 
CalcitePlannerUnsupportedException(NotSupported.SEMI_JOINS);
              :     }
              :
              :     if (ANTI_JOIN.matcher(queryCtx.getStmt()).find()) {
              :       throw new 
CalcitePlannerUnsupportedException(NotSupported.ANTI_JOINS);
              :     }
              :
              :     TQueryOptions options = 
queryCtx.getTQueryCtx().client_request.getQuery_options();
              :     if (!options.isDecimal_v2()) {
              :       throw new 
CalcitePlannerUnsupportedException(NotSupported.DECIMAL_V1);
              :     }
              :
              :     if (options.isAppx_count_distinct()) {
              :       throw new 
CalcitePlannerUnsupportedException(NotSupported.APPX_COUNT_DISTINCT);
              :     }
              :   }
Nit: These functions do something similar via different methods. One is a 
boolean check that returns early and calls the old planner. The other throws an 
exception that gets caught and calls the old planner. Unifying them to do 
things the same way (i.e. always do the boolean method or always do the 
exception) would be nice.

Some of this also seems like it could be tested via a JUnit test.


http://gerrit.cloudera.org:8080/#/c/21109/18/java/calcite-planner/src/main/java/org/apache/impala/calcite/util/NotSupported.java
File 
java/calcite-planner/src/main/java/org/apache/impala/calcite/util/NotSupported.java:

http://gerrit.cloudera.org:8080/#/c/21109/18/java/calcite-planner/src/main/java/org/apache/impala/calcite/util/NotSupported.java@26
PS18, Line 26: ysupported
Nit: supported


http://gerrit.cloudera.org:8080/#/c/21109/18/testdata/workloads/functional-query/queries/QueryTest/calcite.test
File testdata/workloads/functional-query/queries/QueryTest/calcite.test:

http://gerrit.cloudera.org:8080/#/c/21109/18/testdata/workloads/functional-query/queries/QueryTest/calcite.test@11
PS18, Line 11: ---- QUERY
             : select * from calcite_alltypes;
             : ---- RESULTS
             : 0,true,0,0,0,0,0,0,'01/01/09','0',2009-01-01 00:00:00,2009,1
             : 
1,false,1,1,1,10,1.100000023841858,10.1,'01/01/09','1',2009-01-01 
00:01:00,2009,1
             : 2,true,2,2,2,20,2.200000047683716,20.2,'01/01/09','2',2009-01-01 
00:02:00.100000000,2009,1
             : 
3,false,3,3,3,30,3.299999952316284,30.3,'01/01/09','3',2009-01-01 
00:03:00.300000000,2009,1
             : 4,true,4,4,4,40,4.400000095367432,40.4,'01/01/09','4',2009-01-01 
00:04:00.600000000,2009,1
             : ---- TYPES
             : 
int,boolean,tinyint,smallint,int,bigint,float,double,string,string,timestamp,int,int
             : ====
For tests like this, is there a way to verify that it is using Calcite (i.e. it 
didn't fail over to the ordinary planner)? In some tests, we do extra checks on 
the runtime profile (see test with a RUNTIME_PROFILE section).

(or if we don't do fallback, we can punt on that for now)


http://gerrit.cloudera.org:8080/#/c/21109/18/testdata/workloads/functional-query/queries/QueryTest/calcite.test@32
PS18, Line 32: ====
             : ---- QUERY
             : select d1,d2,d3,d4,d5,d6 from functional.decimal_tbl;
             : ---- RESULTS
             : 
1234,2222,1.2345678900,0.12345678900000000000000000000000000000,12345.78900,1
             : 
12345,333,123.4567890000,0.12345678900000000000000000000000000000,11.22000,1
             : 
12345,333,1234.5678900000,0.12345678900000000000000000000000000000,0.10000,1
             : 
132842,333,12345.6789000000,0.12345678900000000000000000000000000000,0.77889,1
             : 
2345,111,12.3456789000,0.12345678900000000000000000000000000000,3.14100,1
             : ---- TYPES
             : decimal,decimal,decimal,decimal,decimal,decimal
             : ====
It would be nice to have tests for SQLs that bail out of Calcite planning early 
(e.g. decimal v1, Kudu tables, semi joins, etc).



--
To view, visit http://gerrit.cloudera.org:8080/21109
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I453fd75b7b705f4d7de1ed73c3e24cafad0b8c98
Gerrit-Change-Number: 21109
Gerrit-PatchSet: 18
Gerrit-Owner: Steve Carlin <[email protected]>
Gerrit-Reviewer: Aman Sinha <[email protected]>
Gerrit-Reviewer: Csaba Ringhofer <[email protected]>
Gerrit-Reviewer: Impala Public Jenkins <[email protected]>
Gerrit-Reviewer: Joe McDonnell <[email protected]>
Gerrit-Reviewer: Michael Smith <[email protected]>
Gerrit-Reviewer: Quanlong Huang <[email protected]>
Gerrit-Reviewer: Steve Carlin <[email protected]>
Gerrit-Comment-Date: Tue, 09 Apr 2024 03:17:41 +0000
Gerrit-HasComments: Yes

Reply via email to