Alex Behm has posted comments on this change. ( http://gerrit.cloudera.org:8080/7564 )
Change subject: IMPALA-3548: Prune runtime filters based on query options in the FE ...................................................................... Patch Set 10: (27 comments) http://gerrit.cloudera.org:8080/#/c/7564/10/be/src/service/fe-support.cc File be/src/service/fe-support.cc: http://gerrit.cloudera.org:8080/#/c/7564/10/be/src/service/fe-support.cc@455 PS10, Line 455: jbyteArray thrift_in_query_options) { Not sure what 'thrift_in_query_options' means. How about using 'tquery_options' instead? Ideally, we would not need to pass in the existing query options. http://gerrit.cloudera.org:8080/#/c/7564/10/be/src/service/fe-support.cc@457 PS10, Line 457: DeserializeThriftMsg(env, thrift_in_query_options, &options); Needs error handling, e.g. using THROW_IF_ERROR_RET http://gerrit.cloudera.org:8080/#/c/7564/10/be/src/service/fe-support.cc@458 PS10, Line 458: const char *o = env->GetStringUTFChars(options_str, NULL); * You need to release the string UTF chars, take a look at JniUtfCharGuard in jni-util.h * style: const char* o http://gerrit.cloudera.org:8080/#/c/7564/10/be/src/service/fe-support.cc@461 PS10, Line 461: TParseQueryOptionsResult result; I think it's simpler to convert all Status to an exception. That way we don't need the TParseQueryOptionsResult at all, and the error handling is consistent (always throws an exception in case of error). http://gerrit.cloudera.org:8080/#/c/7564/10/fe/src/main/java/org/apache/impala/planner/Planner.java File fe/src/main/java/org/apache/impala/planner/Planner.java: http://gerrit.cloudera.org:8080/#/c/7564/10/fe/src/main/java/org/apache/impala/planner/Planner.java@119 PS10, Line 119: // Create runtime filters. 'singleNodePlan' now points to the root of the distributed Comment is wrong. 'singleNodePlan' definitely does not point to the root of the distributed plan. Use rootFragment.getPlanRoot() instead http://gerrit.cloudera.org:8080/#/c/7564/10/fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java File fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java: http://gerrit.cloudera.org:8080/#/c/7564/10/fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java@354 PS10, Line 354: public void addTarget(RuntimeFilterTarget target) { targets_.add(target); } add newline before http://gerrit.cloudera.org:8080/#/c/7564/10/fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java@397 PS10, Line 397: Preconditions.checkState(maxNumFilters >= 0); Why is 0 not a valid value? http://gerrit.cloudera.org:8080/#/c/7564/10/fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java@423 PS10, Line 423: filter.computeNdvEstimate(); My understanding is that IMPALA-3450 has been fixed and we can remove this code. http://gerrit.cloudera.org:8080/#/c/7564/10/fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java@537 PS10, Line 537: * The assigned filters are the ones for which 'scanNode' can be used a destination can be used as a destination node http://gerrit.cloudera.org:8080/#/c/7564/10/fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java@539 PS10, Line 539: * 1. If DISABLE_ROW_RUNTIME_FILTERING query option is set, a filter is only assigned to If the DISABLE_ROW_RUNTIME_FILTERING query option is set ... http://gerrit.cloudera.org:8080/#/c/7564/10/fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java@541 PS10, Line 541: * 2. If RUNTIME_FILTER_MODE query option is set to LOCAL, a filter is only assigned to If the RUNTIME_FILTER_MODE query option is set ... http://gerrit.cloudera.org:8080/#/c/7564/10/fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java@542 PS10, Line 542: * 'scanNode' if the filter is local to the scan node. This doesn't really explain what the LOCAL option means. How about: ... a filter is only assigned to 'scanNode' if the filter is produced within the same fragment that contains the scan node. http://gerrit.cloudera.org:8080/#/c/7564/10/fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java@563 PS10, Line 563: if (!isSingleNodeExec && runtimeFilterMode == TRuntimeFilterMode.LOCAL && Why the !isSingleNodeExec condition? I think that !isLocalTarget implies !isSingleNodeExec. http://gerrit.cloudera.org:8080/#/c/7564/10/fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java@576 PS10, Line 576: static private boolean IsLocalTarget(RuntimeFilter filter, ScanNode targetNode) { isLocalTarget() http://gerrit.cloudera.org:8080/#/c/7564/10/fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java@583 PS10, Line 583: static private boolean IsBoundByPartitionColumns(Analyzer analyzer, Expr targetExpr, isBoundByPartitionColumns() http://gerrit.cloudera.org:8080/#/c/7564/10/fe/src/main/java/org/apache/impala/service/FeSupport.java File fe/src/main/java/org/apache/impala/service/FeSupport.java: http://gerrit.cloudera.org:8080/#/c/7564/10/fe/src/main/java/org/apache/impala/service/FeSupport.java@92 PS10, Line 92: public native static byte[] NativeParseQueryOptions(String optionsStr, to be clearer call the first arg csvQueryOptions http://gerrit.cloudera.org:8080/#/c/7564/10/fe/src/main/java/org/apache/impala/service/FeSupport.java@93 PS10, Line 93: Can we get rid of the second argument and use PlannerTestBase.mergeQueryOptions() instead? http://gerrit.cloudera.org:8080/#/c/7564/10/fe/src/main/java/org/apache/impala/service/FeSupport.java@282 PS10, Line 282: public static TParseQueryOptionsResult ParseQueryOptions(String optionsStr, Public function needs a comment. API feels a little weird. This function can throw but also return an error in the TParseQueryOptionsResult response. I suggest converting all errors into exceptions and returning a TQueryOptions or alternative just modifying the given TQueryOptions in-place via PlannerTestBase.mergeQueryOptions(). Might need to factor that out into a utility function. http://gerrit.cloudera.org:8080/#/c/7564/10/fe/src/test/java/org/apache/impala/planner/PlannerTest.java File fe/src/test/java/org/apache/impala/planner/PlannerTest.java: http://gerrit.cloudera.org:8080/#/c/7564/10/fe/src/test/java/org/apache/impala/planner/PlannerTest.java@300 PS10, Line 300: public void testRuntimeFilterPruning() { testRuntimeFilterQueryOptions()? http://gerrit.cloudera.org:8080/#/c/7564/10/fe/src/test/java/org/apache/impala/testutil/TestFileParser.java File fe/src/test/java/org/apache/impala/testutil/TestFileParser.java: http://gerrit.cloudera.org:8080/#/c/7564/10/fe/src/test/java/org/apache/impala/testutil/TestFileParser.java@288 PS10, Line 288: parseQueryOptions(currentTestCase); Why do we need to call this here and in L338? Isn't the call in L338 enough? http://gerrit.cloudera.org:8080/#/c/7564/10/testdata/workloads/functional-planner/queries/PlannerTest/runtime-filter-pruning.test File testdata/workloads/functional-planner/queries/PlannerTest/runtime-filter-pruning.test: http://gerrit.cloudera.org:8080/#/c/7564/10/testdata/workloads/functional-planner/queries/PlannerTest/runtime-filter-pruning.test@2 PS10, Line 2: select count(*) from functional.alltypes a When using join hints it is recommended to also use straight_join. Also please use the preferred comment-style hint syntax, i.e.: select /* +straight_join */ count(*) from functional.alltypes a join /* +broadcast */ functional.alltypes b ... Please apply everywhere in this file. I know we are already inconsistent in our existing tests, but that should not stop us from improving new tests :) http://gerrit.cloudera.org:8080/#/c/7564/10/testdata/workloads/functional-planner/queries/PlannerTest/runtime-filter-pruning.test@85 PS10, Line 85: select count(*) from functional.alltypes a Can we make it more obvious which runtime filters are the most selective? The selection is not intuitive to follow. http://gerrit.cloudera.org:8080/#/c/7564/10/testdata/workloads/functional-planner/queries/PlannerTest/runtime-filter-pruning.test@89 PS10, Line 89: ---- QUERYOPTIONS This is really useful! Thanks for adding it! http://gerrit.cloudera.org:8080/#/c/7564/10/testdata/workloads/functional-planner/queries/PlannerTest/runtime-filter-pruning.test@91 PS10, Line 91: ---- PLAN To reduce the number of test lines, let's only keep those sections that are really needed. In this case I think the DISTRIBUTEDPLAN should be enough. http://gerrit.cloudera.org:8080/#/c/7564/10/testdata/workloads/functional-planner/queries/PlannerTest/runtime-filter-pruning.test@167 PS10, Line 167: # DISABLE_ROW_RUNTIME_FILTERING is set: only partition column filters are applied What's the effect of setting MAX_NUM_RUNTIME_FILTERS=3 in this test? Does it select the 3 partition-pruning filters as expected? http://gerrit.cloudera.org:8080/#/c/7564/10/testdata/workloads/functional-planner/queries/PlannerTest/runtime-filter-pruning.test@252 PS10, Line 252: # RUNTIME_FILTER_MODE is set to LOCAL: only local filters are applied Same question as above. What happens when we set MAX_NUM_RUNTIME_FILTERS=4. Does it behave as expected? http://gerrit.cloudera.org:8080/#/c/7564/10/testdata/workloads/functional-planner/queries/PlannerTest/runtime-filter-pruning.test@418 PS10, Line 418: # RUNTIME_FILTER_MODE is OFF: no filters are applied Is this the same as MAX_NUM_RUNTIME_FILTERS=0? -- To view, visit http://gerrit.cloudera.org:8080/7564 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Id0f0b200e02442edcad8df3979f652d66c6e52eb Gerrit-Change-Number: 7564 Gerrit-PatchSet: 10 Gerrit-Owner: Attila Jeges <[email protected]> Gerrit-Reviewer: Alex Behm <[email protected]> Gerrit-Reviewer: Attila Jeges <[email protected]> Gerrit-Reviewer: Dimitris Tsirogiannis <[email protected]> Gerrit-Reviewer: Matthew Jacobs <[email protected]> Gerrit-Reviewer: Michael Ho <[email protected]> Gerrit-Reviewer: Philip Zeyliger <[email protected]> Gerrit-Reviewer: Thomas Tauber-Marshall <[email protected]> Gerrit-Comment-Date: Tue, 17 Oct 2017 22:06:31 +0000 Gerrit-HasComments: Yes
