Alex Behm has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/7564 )

Change subject: IMPALA-3548: Prune runtime filters based on query options in 
the FE
......................................................................


Patch Set 10:

(27 comments)

http://gerrit.cloudera.org:8080/#/c/7564/10/be/src/service/fe-support.cc
File be/src/service/fe-support.cc:

http://gerrit.cloudera.org:8080/#/c/7564/10/be/src/service/fe-support.cc@455
PS10, Line 455:     jbyteArray thrift_in_query_options) {
Not sure what 'thrift_in_query_options' means. How about using 'tquery_options' 
instead? Ideally, we would not need to pass in the existing query options.


http://gerrit.cloudera.org:8080/#/c/7564/10/be/src/service/fe-support.cc@457
PS10, Line 457:   DeserializeThriftMsg(env, thrift_in_query_options, &options);
Needs error handling, e.g. using THROW_IF_ERROR_RET


http://gerrit.cloudera.org:8080/#/c/7564/10/be/src/service/fe-support.cc@458
PS10, Line 458:   const char *o = env->GetStringUTFChars(options_str, NULL);
* You need to release the string UTF chars, take a look at JniUtfCharGuard in 
jni-util.h

* style: const char* o


http://gerrit.cloudera.org:8080/#/c/7564/10/be/src/service/fe-support.cc@461
PS10, Line 461:   TParseQueryOptionsResult result;
I think it's simpler to convert all Status to an exception. That way we don't 
need the TParseQueryOptionsResult at all, and the error handling is consistent 
(always throws an exception in case of error).


http://gerrit.cloudera.org:8080/#/c/7564/10/fe/src/main/java/org/apache/impala/planner/Planner.java
File fe/src/main/java/org/apache/impala/planner/Planner.java:

http://gerrit.cloudera.org:8080/#/c/7564/10/fe/src/main/java/org/apache/impala/planner/Planner.java@119
PS10, Line 119:     // Create runtime filters. 'singleNodePlan' now points to 
the root of the distributed
Comment is wrong. 'singleNodePlan' definitely does not point to the root of the 
distributed plan.

Use rootFragment.getPlanRoot() instead


http://gerrit.cloudera.org:8080/#/c/7564/10/fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java
File fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java:

http://gerrit.cloudera.org:8080/#/c/7564/10/fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java@354
PS10, Line 354:     public void addTarget(RuntimeFilterTarget target) { 
targets_.add(target); }
add newline before


http://gerrit.cloudera.org:8080/#/c/7564/10/fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java@397
PS10, Line 397:     Preconditions.checkState(maxNumFilters >= 0);
Why is 0 not a valid value?


http://gerrit.cloudera.org:8080/#/c/7564/10/fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java@423
PS10, Line 423:       filter.computeNdvEstimate();
My understanding is that IMPALA-3450 has been fixed and we can remove this code.


http://gerrit.cloudera.org:8080/#/c/7564/10/fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java@537
PS10, Line 537:    * The assigned filters are the ones for which 'scanNode' can 
be used a destination
can be used as a destination node


http://gerrit.cloudera.org:8080/#/c/7564/10/fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java@539
PS10, Line 539:    * 1. If DISABLE_ROW_RUNTIME_FILTERING query option is set, a 
filter is only assigned to
If the DISABLE_ROW_RUNTIME_FILTERING query option is set ...


http://gerrit.cloudera.org:8080/#/c/7564/10/fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java@541
PS10, Line 541:    * 2. If RUNTIME_FILTER_MODE query option is set to LOCAL, a 
filter is only assigned to
If the RUNTIME_FILTER_MODE query option is set ...


http://gerrit.cloudera.org:8080/#/c/7564/10/fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java@542
PS10, Line 542:    *    'scanNode' if the filter is local to the scan node.
This doesn't really explain what the LOCAL option means. How about:

... a filter is only assigned to 'scanNode' if the filter is produced within 
the same fragment that contains the scan node.


http://gerrit.cloudera.org:8080/#/c/7564/10/fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java@563
PS10, Line 563:       if (!isSingleNodeExec && runtimeFilterMode == 
TRuntimeFilterMode.LOCAL &&
Why the !isSingleNodeExec condition? I think that !isLocalTarget implies 
!isSingleNodeExec.


http://gerrit.cloudera.org:8080/#/c/7564/10/fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java@576
PS10, Line 576:   static private boolean IsLocalTarget(RuntimeFilter filter, 
ScanNode targetNode) {
isLocalTarget()


http://gerrit.cloudera.org:8080/#/c/7564/10/fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java@583
PS10, Line 583:   static private boolean IsBoundByPartitionColumns(Analyzer 
analyzer, Expr targetExpr,
isBoundByPartitionColumns()


http://gerrit.cloudera.org:8080/#/c/7564/10/fe/src/main/java/org/apache/impala/service/FeSupport.java
File fe/src/main/java/org/apache/impala/service/FeSupport.java:

http://gerrit.cloudera.org:8080/#/c/7564/10/fe/src/main/java/org/apache/impala/service/FeSupport.java@92
PS10, Line 92:   public native static byte[] NativeParseQueryOptions(String 
optionsStr,
to be clearer call the first arg csvQueryOptions


http://gerrit.cloudera.org:8080/#/c/7564/10/fe/src/main/java/org/apache/impala/service/FeSupport.java@93
PS10, Line 93:
Can we get rid of the second argument and use 
PlannerTestBase.mergeQueryOptions() instead?


http://gerrit.cloudera.org:8080/#/c/7564/10/fe/src/main/java/org/apache/impala/service/FeSupport.java@282
PS10, Line 282:   public static TParseQueryOptionsResult 
ParseQueryOptions(String optionsStr,
Public function needs a comment.

API feels a little weird. This function can throw but also return an error in 
the TParseQueryOptionsResult response.

I suggest converting all errors into exceptions and returning a TQueryOptions 
or alternative just modifying the given TQueryOptions in-place via 
PlannerTestBase.mergeQueryOptions(). Might need to factor that out into a 
utility function.


http://gerrit.cloudera.org:8080/#/c/7564/10/fe/src/test/java/org/apache/impala/planner/PlannerTest.java
File fe/src/test/java/org/apache/impala/planner/PlannerTest.java:

http://gerrit.cloudera.org:8080/#/c/7564/10/fe/src/test/java/org/apache/impala/planner/PlannerTest.java@300
PS10, Line 300:   public void testRuntimeFilterPruning() {
testRuntimeFilterQueryOptions()?


http://gerrit.cloudera.org:8080/#/c/7564/10/fe/src/test/java/org/apache/impala/testutil/TestFileParser.java
File fe/src/test/java/org/apache/impala/testutil/TestFileParser.java:

http://gerrit.cloudera.org:8080/#/c/7564/10/fe/src/test/java/org/apache/impala/testutil/TestFileParser.java@288
PS10, Line 288:         parseQueryOptions(currentTestCase);
Why do we need to call this here and in L338? Isn't the call in L338 enough?


http://gerrit.cloudera.org:8080/#/c/7564/10/testdata/workloads/functional-planner/queries/PlannerTest/runtime-filter-pruning.test
File 
testdata/workloads/functional-planner/queries/PlannerTest/runtime-filter-pruning.test:

http://gerrit.cloudera.org:8080/#/c/7564/10/testdata/workloads/functional-planner/queries/PlannerTest/runtime-filter-pruning.test@2
PS10, Line 2: select count(*) from functional.alltypes a
When using join hints it is recommended to also use straight_join. Also please 
use the preferred comment-style hint syntax, i.e.:

select /* +straight_join */ count(*) from functional.alltypes a
  join /* +broadcast */ functional.alltypes b ...

Please apply everywhere in this file.

I know we are already inconsistent in our existing tests, but that should not 
stop us from improving new tests :)


http://gerrit.cloudera.org:8080/#/c/7564/10/testdata/workloads/functional-planner/queries/PlannerTest/runtime-filter-pruning.test@85
PS10, Line 85: select count(*) from functional.alltypes a
Can we make it more obvious which runtime filters are the most selective? The 
selection is not intuitive to follow.


http://gerrit.cloudera.org:8080/#/c/7564/10/testdata/workloads/functional-planner/queries/PlannerTest/runtime-filter-pruning.test@89
PS10, Line 89: ---- QUERYOPTIONS
This is really useful! Thanks for adding it!


http://gerrit.cloudera.org:8080/#/c/7564/10/testdata/workloads/functional-planner/queries/PlannerTest/runtime-filter-pruning.test@91
PS10, Line 91: ---- PLAN
To reduce the number of test lines, let's only keep those sections that are 
really needed. In this case I think the DISTRIBUTEDPLAN should be enough.


http://gerrit.cloudera.org:8080/#/c/7564/10/testdata/workloads/functional-planner/queries/PlannerTest/runtime-filter-pruning.test@167
PS10, Line 167: # DISABLE_ROW_RUNTIME_FILTERING is set: only partition column 
filters are applied
What's the effect of setting MAX_NUM_RUNTIME_FILTERS=3 in this test? Does it 
select the 3 partition-pruning filters as expected?


http://gerrit.cloudera.org:8080/#/c/7564/10/testdata/workloads/functional-planner/queries/PlannerTest/runtime-filter-pruning.test@252
PS10, Line 252: # RUNTIME_FILTER_MODE is set to LOCAL: only local filters are 
applied
Same question as above. What happens when we set MAX_NUM_RUNTIME_FILTERS=4. 
Does it behave as expected?


http://gerrit.cloudera.org:8080/#/c/7564/10/testdata/workloads/functional-planner/queries/PlannerTest/runtime-filter-pruning.test@418
PS10, Line 418: # RUNTIME_FILTER_MODE is OFF: no filters are applied
Is this the same as MAX_NUM_RUNTIME_FILTERS=0?



--
To view, visit http://gerrit.cloudera.org:8080/7564
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Id0f0b200e02442edcad8df3979f652d66c6e52eb
Gerrit-Change-Number: 7564
Gerrit-PatchSet: 10
Gerrit-Owner: Attila Jeges <[email protected]>
Gerrit-Reviewer: Alex Behm <[email protected]>
Gerrit-Reviewer: Attila Jeges <[email protected]>
Gerrit-Reviewer: Dimitris Tsirogiannis <[email protected]>
Gerrit-Reviewer: Matthew Jacobs <[email protected]>
Gerrit-Reviewer: Michael Ho <[email protected]>
Gerrit-Reviewer: Philip Zeyliger <[email protected]>
Gerrit-Reviewer: Thomas Tauber-Marshall <[email protected]>
Gerrit-Comment-Date: Tue, 17 Oct 2017 22:06:31 +0000
Gerrit-HasComments: Yes

Reply via email to