Joe McDonnell has posted comments on this change. ( http://gerrit.cloudera.org:8080/21109 )
Change subject: IMPALA-12872: Use Calcite for optimization - part 1: simple queries ...................................................................... Patch Set 18: (7 comments) This is looking good, but I have a couple points. First, if it doesn't have a test, then it doesn't exist. Maybe that's a bit strict for something this early, but it is the only thing that I know works. If we accumulate code without tests, it's a pain to go back and fix that. Along that vein, it would be good to have tests for the cases that bailout early from the Calcite planner. Some examples: 1. Testing semi join / anti join detection 2. Testing with decimal v1 / appx count distinct 3. Testing with Kudu/HBase table 4. Testing with complex types 5. Testing with a view 6. The logic in canStmtBePlannedThroughCalcite() Some of these will go away over time, and that's ok. Second, we have the first piece of fallback, but fallback is going to need more work. We're going to want to have reasonable information in the profiles and we'll want test cases. We are probably going to need to find a way to merge the timelines / account for time before Calcite falls back. Storing the error message that caused fallback into the profile would be useful. All of this is a long way of saying: I would be ok if this change didn't have fallback. http://gerrit.cloudera.org:8080/#/c/21109/18/bin/set-classpath.sh File bin/set-classpath.sh: http://gerrit.cloudera.org:8080/#/c/21109/18/bin/set-classpath.sh@41 PS18, Line 41: Nit: stray line http://gerrit.cloudera.org:8080/#/c/21109/18/bin/set-classpath.sh@56 PS18, Line 56: echo "USE_CALCITE_PLANNER" : echo $USE_CALCITE_PLANNER Can we skip this debug statement? http://gerrit.cloudera.org:8080/#/c/21109/18/java/calcite-planner/src/main/java/org/apache/impala/calcite/service/CalciteJniFrontend.java File java/calcite-planner/src/main/java/org/apache/impala/calcite/service/CalciteJniFrontend.java: http://gerrit.cloudera.org:8080/#/c/21109/18/java/calcite-planner/src/main/java/org/apache/impala/calcite/service/CalciteJniFrontend.java@144 PS18, Line 144: catch (Exception e) { : LOG.info("Calcite planner failed."); : LOG.info("Exception: " + e); : if (e != null) { : LOG.info("Stack Trace:" + ExceptionUtils.getStackTrace(e)); : throw new InternalException(e.getMessage()); : } : throw new RuntimeException(e); : } For complete fallback support, this case would need to fall back. In practice, most exceptions are not CalcitePlannerUnsupportedExceptions. e.g. "select count(*) from functional.alltypes" doesn't fall back. There are a variety of test cases that I would want to see for fallback functionality (e.g. what does the profile show for fallback, what things fallback instantly versus fallback as a result of a Calcite error, etc). Should we skip fallback in this initial change? http://gerrit.cloudera.org:8080/#/c/21109/18/java/calcite-planner/src/main/java/org/apache/impala/calcite/service/CalciteJniFrontend.java@159 PS18, Line 159: private boolean canStmtBePlannedThroughCalcite(QueryContext queryCtx) { : String stringWithFirstRealWord = queryCtx.getStmt(); : String[] lines = stringWithFirstRealWord.split("\n"); : // Get rid of comments and blank lines which start the query. We need to find : // the first real word. : // TODO: IMPALA-12976: need to make this more generic. Certain patterns aren't caught : // here like /* */ : for (String line : lines) { : if (line.trim().startsWith("--") || line.trim().equals("")) { : stringWithFirstRealWord = stringWithFirstRealWord.replaceFirst(line + "\n", ""); : } else { : break; : } : } : stringWithFirstRealWord = stringWithFirstRealWord.trim(); : String beforeStripString; : do { : beforeStripString = stringWithFirstRealWord; : stringWithFirstRealWord = StringUtils.stripStart(stringWithFirstRealWord, "("); : stringWithFirstRealWord = StringUtils.stripStart(stringWithFirstRealWord, null); : } while (!stringWithFirstRealWord.equals(beforeStripString)); : return StringUtils.startsWithIgnoreCase(stringWithFirstRealWord, "select") || : StringUtils.startsWithIgnoreCase(stringWithFirstRealWord, "values") || : StringUtils.startsWithIgnoreCase(stringWithFirstRealWord, "with"); : } : : private void checkUnsupportedFeatures(QueryContext queryCtx) : throws CalcitePlannerUnsupportedException { : if (SEMI_JOIN.matcher(queryCtx.getStmt()).find()) { : throw new CalcitePlannerUnsupportedException(NotSupported.SEMI_JOINS); : } : : if (ANTI_JOIN.matcher(queryCtx.getStmt()).find()) { : throw new CalcitePlannerUnsupportedException(NotSupported.ANTI_JOINS); : } : : TQueryOptions options = queryCtx.getTQueryCtx().client_request.getQuery_options(); : if (!options.isDecimal_v2()) { : throw new CalcitePlannerUnsupportedException(NotSupported.DECIMAL_V1); : } : : if (options.isAppx_count_distinct()) { : throw new CalcitePlannerUnsupportedException(NotSupported.APPX_COUNT_DISTINCT); : } : } Nit: These functions do something similar via different methods. One is a boolean check that returns early and calls the old planner. The other throws an exception that gets caught and calls the old planner. Unifying them to do things the same way (i.e. always do the boolean method or always do the exception) would be nice. Some of this also seems like it could be tested via a JUnit test. http://gerrit.cloudera.org:8080/#/c/21109/18/java/calcite-planner/src/main/java/org/apache/impala/calcite/util/NotSupported.java File java/calcite-planner/src/main/java/org/apache/impala/calcite/util/NotSupported.java: http://gerrit.cloudera.org:8080/#/c/21109/18/java/calcite-planner/src/main/java/org/apache/impala/calcite/util/NotSupported.java@26 PS18, Line 26: ysupported Nit: supported http://gerrit.cloudera.org:8080/#/c/21109/18/testdata/workloads/functional-query/queries/QueryTest/calcite.test File testdata/workloads/functional-query/queries/QueryTest/calcite.test: http://gerrit.cloudera.org:8080/#/c/21109/18/testdata/workloads/functional-query/queries/QueryTest/calcite.test@11 PS18, Line 11: ---- QUERY : select * from calcite_alltypes; : ---- RESULTS : 0,true,0,0,0,0,0,0,'01/01/09','0',2009-01-01 00:00:00,2009,1 : 1,false,1,1,1,10,1.100000023841858,10.1,'01/01/09','1',2009-01-01 00:01:00,2009,1 : 2,true,2,2,2,20,2.200000047683716,20.2,'01/01/09','2',2009-01-01 00:02:00.100000000,2009,1 : 3,false,3,3,3,30,3.299999952316284,30.3,'01/01/09','3',2009-01-01 00:03:00.300000000,2009,1 : 4,true,4,4,4,40,4.400000095367432,40.4,'01/01/09','4',2009-01-01 00:04:00.600000000,2009,1 : ---- TYPES : int,boolean,tinyint,smallint,int,bigint,float,double,string,string,timestamp,int,int : ==== For tests like this, is there a way to verify that it is using Calcite (i.e. it didn't fail over to the ordinary planner)? In some tests, we do extra checks on the runtime profile (see test with a RUNTIME_PROFILE section). (or if we don't do fallback, we can punt on that for now) http://gerrit.cloudera.org:8080/#/c/21109/18/testdata/workloads/functional-query/queries/QueryTest/calcite.test@32 PS18, Line 32: ==== : ---- QUERY : select d1,d2,d3,d4,d5,d6 from functional.decimal_tbl; : ---- RESULTS : 1234,2222,1.2345678900,0.12345678900000000000000000000000000000,12345.78900,1 : 12345,333,123.4567890000,0.12345678900000000000000000000000000000,11.22000,1 : 12345,333,1234.5678900000,0.12345678900000000000000000000000000000,0.10000,1 : 132842,333,12345.6789000000,0.12345678900000000000000000000000000000,0.77889,1 : 2345,111,12.3456789000,0.12345678900000000000000000000000000000,3.14100,1 : ---- TYPES : decimal,decimal,decimal,decimal,decimal,decimal : ==== It would be nice to have tests for SQLs that bail out of Calcite planning early (e.g. decimal v1, Kudu tables, semi joins, etc). -- To view, visit http://gerrit.cloudera.org:8080/21109 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I453fd75b7b705f4d7de1ed73c3e24cafad0b8c98 Gerrit-Change-Number: 21109 Gerrit-PatchSet: 18 Gerrit-Owner: Steve Carlin <[email protected]> Gerrit-Reviewer: Aman Sinha <[email protected]> Gerrit-Reviewer: Csaba Ringhofer <[email protected]> Gerrit-Reviewer: Impala Public Jenkins <[email protected]> Gerrit-Reviewer: Joe McDonnell <[email protected]> Gerrit-Reviewer: Michael Smith <[email protected]> Gerrit-Reviewer: Quanlong Huang <[email protected]> Gerrit-Reviewer: Steve Carlin <[email protected]> Gerrit-Comment-Date: Tue, 09 Apr 2024 03:17:41 +0000 Gerrit-HasComments: Yes
