Vuk Ercegovac has posted comments on this change. ( http://gerrit.cloudera.org:8080/8480 )
Change subject: IMPALA-4985: use parquet stats of nested types for dynamic pruning ...................................................................... Patch Set 2: (8 comments) http://gerrit.cloudera.org:8080/#/c/8480/2//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/8480/2//COMMIT_MSG@13 PS2, Line 13: value > Should this read "type"? clarified http://gerrit.cloudera.org:8080/#/c/8480/2//COMMIT_MSG@16 PS2, Line 16: the scalar value must be on a path : to the root of the nested value where every node on the path : is required > I'm not sure I'm following the reasoning behind this. Please see my comment reworded and included examples (in the tests) that I had trouble separating from cases that could indeed be pruned. http://gerrit.cloudera.org:8080/#/c/8480/2/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java File fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java: http://gerrit.cloudera.org:8080/#/c/8480/2/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java@567 PS2, Line 567: tryComputeMinMaxPredicate(analyzer, pred); > nit: this looks like it could go on a single line now. Done http://gerrit.cloudera.org:8080/#/c/8480/2/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java@575 PS2, Line 575: tryComputeMinMaxPredicate(analyzer, pred); > nit: single line? Done http://gerrit.cloudera.org:8080/#/c/8480/2/testdata/workloads/functional-query/queries/QueryTest/nested-types-parquet-stats.test File testdata/workloads/functional-query/queries/QueryTest/nested-types-parquet-stats.test: http://gerrit.cloudera.org:8080/#/c/8480/2/testdata/workloads/functional-query/queries/QueryTest/nested-types-parquet-stats.test@9 PS2, Line 9: row_regex: .*NumStatsFilteredRowGroups: 2 .* > While you're here, do you mind converting them to the aggregation(..) synta done. didn't make that connection until you pointed it out. http://gerrit.cloudera.org:8080/#/c/8480/2/testdata/workloads/functional-query/queries/QueryTest/nested-types-parquet-stats.test@58 PS2, Line 58: where bottom.item < -2; > This looks like a c&p error from the query above. Can you double check that latest update works. http://gerrit.cloudera.org:8080/#/c/8480/2/testdata/workloads/functional-query/queries/QueryTest/nested-types-parquet-stats.test@98 PS2, Line 98: where a.item.e < -10 > This may be seem like an ignorant question, but doesn't this predicate make good point-- seems that the current approach is conservative. this could be pruned, but we currently do not. I've added a testcase below that illustrates an example of a collection filter, with no !empty guard, for which pruning would be incorrect (tried it and it produces different results than we currently do). I've also added a comment in the HDFSScanNode to point out that the current approach is conservative. http://gerrit.cloudera.org:8080/#/c/8480/2/testdata/workloads/functional-query/queries/QueryTest/nested-types-parquet-stats.test@107 PS2, Line 107: left outer join c.nested_struct.c.d cn, cn.item a where a.item.e < -10; > Same here, if a row group has no values in nested_struct.c.d.item.item.e th see comment above. -- To view, visit http://gerrit.cloudera.org:8080/8480 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I0c99e20cb080b504442cd5376ea3e046016158fe Gerrit-Change-Number: 8480 Gerrit-PatchSet: 2 Gerrit-Owner: Vuk Ercegovac <[email protected]> Gerrit-Reviewer: Lars Volker <[email protected]> Gerrit-Reviewer: Vuk Ercegovac <[email protected]> Gerrit-Comment-Date: Wed, 15 Nov 2017 02:46:49 +0000 Gerrit-HasComments: Yes
