Alex Behm has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8480 )

Change subject: IMPALA-4985: use parquet stats of nested types for dynamic 
pruning
......................................................................


Patch Set 5:

(9 comments)

http://gerrit.cloudera.org:8080/#/c/8480/5/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java
File fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java:

http://gerrit.cloudera.org:8080/#/c/8480/5/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java@439
PS5, Line 439:   private boolean isArrayPosReference(SlotRef slotRef) {
Move to SlotRef?


http://gerrit.cloudera.org:8080/#/c/8480/5/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java@562
PS5, Line 562:       // Adds only predicates for collections that are guarded 
by an IsNotEmptyPredicate.
guarded -> filtered

I think that makes it clearer that it's a pure perf optimization


http://gerrit.cloudera.org:8080/#/c/8480/5/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java@563
PS5, Line 563:       // Its assumed that analysis adds these guards such that 
they are correct, but
It is assumed


http://gerrit.cloudera.org:8080/#/c/8480/5/testdata/workloads/functional-planner/queries/PlannerTest/parquet-filtering.test
File 
testdata/workloads/functional-planner/queries/PlannerTest/parquet-filtering.test:

http://gerrit.cloudera.org:8080/#/c/8480/5/testdata/workloads/functional-planner/queries/PlannerTest/parquet-filtering.test@95
PS5, Line 95: where a.item.e < -10;
Can you add a filter at all levels to make sure that all works together?

If it's too hard with complextypestbl you can use tpch_nested_parquet.customer


http://gerrit.cloudera.org:8080/#/c/8480/5/testdata/workloads/functional-planner/queries/PlannerTest/parquet-filtering.test@96
PS5, Line 96: ---- PLAN
Do we need to go to explain level 2 in all these tests?


http://gerrit.cloudera.org:8080/#/c/8480/5/testdata/workloads/functional-planner/queries/PlannerTest/parquet-filtering.test@99
PS5, Line 99: PLAN-ROOT SINK
Do we have tests for min-max filters on a top-level struct? I mean something 
like:

create table t (s struct<f1:int,f2:int>);

select 1 from t where s.f1 < 10


http://gerrit.cloudera.org:8080/#/c/8480/5/testdata/workloads/functional-planner/queries/PlannerTest/parquet-filtering.test@266
PS5, Line 266: # Test collections in a way that would incorrect to apply a 
min-max
garbled sentence


http://gerrit.cloudera.org:8080/#/c/8480/5/testdata/workloads/functional-query/queries/QueryTest/nested-types-parquet-stats.test
File 
testdata/workloads/functional-query/queries/QueryTest/nested-types-parquet-stats.test:

http://gerrit.cloudera.org:8080/#/c/8480/5/testdata/workloads/functional-query/queries/QueryTest/nested-types-parquet-stats.test@40
PS5, Line 40: where  int_map.value < -1;
Can you modify the tests to mix in more predicate variety? For example, use a 
binary predicate with "=", use an IN predicate somewhere, flip the lhs/rhs of 
some predicates, etc.


http://gerrit.cloudera.org:8080/#/c/8480/5/testdata/workloads/functional-query/queries/QueryTest/nested-types-parquet-stats.test@145
PS5, Line 145: # False pruning example. There is one table that's scanned 
(complextypestbl).
Add 1-2 more tests along these lines with non-selective min-max filters that 
correctly prune nothing.



--
To view, visit http://gerrit.cloudera.org:8080/8480
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I0c99e20cb080b504442cd5376ea3e046016158fe
Gerrit-Change-Number: 8480
Gerrit-PatchSet: 5
Gerrit-Owner: Vuk Ercegovac <[email protected]>
Gerrit-Reviewer: Alex Behm <[email protected]>
Gerrit-Reviewer: Lars Volker <[email protected]>
Gerrit-Reviewer: Vuk Ercegovac <[email protected]>
Gerrit-Comment-Date: Tue, 21 Nov 2017 22:05:04 +0000
Gerrit-HasComments: Yes

Reply via email to