Alex Behm has posted comments on this change. ( http://gerrit.cloudera.org:8080/8480 )
Change subject: IMPALA-4985: use parquet stats of nested types for dynamic pruning ...................................................................... Patch Set 5: (9 comments) http://gerrit.cloudera.org:8080/#/c/8480/5/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java File fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java: http://gerrit.cloudera.org:8080/#/c/8480/5/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java@439 PS5, Line 439: private boolean isArrayPosReference(SlotRef slotRef) { Move to SlotRef? http://gerrit.cloudera.org:8080/#/c/8480/5/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java@562 PS5, Line 562: // Adds only predicates for collections that are guarded by an IsNotEmptyPredicate. guarded -> filtered I think that makes it clearer that it's a pure perf optimization http://gerrit.cloudera.org:8080/#/c/8480/5/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java@563 PS5, Line 563: // Its assumed that analysis adds these guards such that they are correct, but It is assumed http://gerrit.cloudera.org:8080/#/c/8480/5/testdata/workloads/functional-planner/queries/PlannerTest/parquet-filtering.test File testdata/workloads/functional-planner/queries/PlannerTest/parquet-filtering.test: http://gerrit.cloudera.org:8080/#/c/8480/5/testdata/workloads/functional-planner/queries/PlannerTest/parquet-filtering.test@95 PS5, Line 95: where a.item.e < -10; Can you add a filter at all levels to make sure that all works together? If it's too hard with complextypestbl you can use tpch_nested_parquet.customer http://gerrit.cloudera.org:8080/#/c/8480/5/testdata/workloads/functional-planner/queries/PlannerTest/parquet-filtering.test@96 PS5, Line 96: ---- PLAN Do we need to go to explain level 2 in all these tests? http://gerrit.cloudera.org:8080/#/c/8480/5/testdata/workloads/functional-planner/queries/PlannerTest/parquet-filtering.test@99 PS5, Line 99: PLAN-ROOT SINK Do we have tests for min-max filters on a top-level struct? I mean something like: create table t (s struct<f1:int,f2:int>); select 1 from t where s.f1 < 10 http://gerrit.cloudera.org:8080/#/c/8480/5/testdata/workloads/functional-planner/queries/PlannerTest/parquet-filtering.test@266 PS5, Line 266: # Test collections in a way that would incorrect to apply a min-max garbled sentence http://gerrit.cloudera.org:8080/#/c/8480/5/testdata/workloads/functional-query/queries/QueryTest/nested-types-parquet-stats.test File testdata/workloads/functional-query/queries/QueryTest/nested-types-parquet-stats.test: http://gerrit.cloudera.org:8080/#/c/8480/5/testdata/workloads/functional-query/queries/QueryTest/nested-types-parquet-stats.test@40 PS5, Line 40: where int_map.value < -1; Can you modify the tests to mix in more predicate variety? For example, use a binary predicate with "=", use an IN predicate somewhere, flip the lhs/rhs of some predicates, etc. http://gerrit.cloudera.org:8080/#/c/8480/5/testdata/workloads/functional-query/queries/QueryTest/nested-types-parquet-stats.test@145 PS5, Line 145: # False pruning example. There is one table that's scanned (complextypestbl). Add 1-2 more tests along these lines with non-selective min-max filters that correctly prune nothing. -- To view, visit http://gerrit.cloudera.org:8080/8480 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I0c99e20cb080b504442cd5376ea3e046016158fe Gerrit-Change-Number: 8480 Gerrit-PatchSet: 5 Gerrit-Owner: Vuk Ercegovac <[email protected]> Gerrit-Reviewer: Alex Behm <[email protected]> Gerrit-Reviewer: Lars Volker <[email protected]> Gerrit-Reviewer: Vuk Ercegovac <[email protected]> Gerrit-Comment-Date: Tue, 21 Nov 2017 22:05:04 +0000 Gerrit-HasComments: Yes
