Lars Volker has posted comments on this change. ( http://gerrit.cloudera.org:8080/8480 )
Change subject: IMPALA-4985: use parquet stats of nested types for dynamic pruning ...................................................................... Patch Set 2: (9 comments) http://gerrit.cloudera.org:8080/#/c/8480/2//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/8480/2//COMMIT_MSG@13 PS2, Line 13: value Should this read "type"? http://gerrit.cloudera.org:8080/#/c/8480/2//COMMIT_MSG@16 PS2, Line 16: the scalar value must be on a path : to the root of the nested value where every node on the path : is required I'm not sure I'm following the reasoning behind this. Please see my comments in the tests. http://gerrit.cloudera.org:8080/#/c/8480/1/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java File fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java: http://gerrit.cloudera.org:8080/#/c/8480/1/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java@1109 PS1, Line 1109: tScanRanges = 0; > hmm, looks like a proxy for estimating when a scan range will definitely be Sounds good, thanks for the clarification. http://gerrit.cloudera.org:8080/#/c/8480/2/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java File fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java: http://gerrit.cloudera.org:8080/#/c/8480/2/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java@567 PS2, Line 567: tryComputeMinMaxPredicate(analyzer, pred); nit: this looks like it could go on a single line now. http://gerrit.cloudera.org:8080/#/c/8480/2/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java@575 PS2, Line 575: tryComputeMinMaxPredicate(analyzer, pred); nit: single line? http://gerrit.cloudera.org:8080/#/c/8480/2/testdata/workloads/functional-query/queries/QueryTest/nested-types-parquet-stats.test File testdata/workloads/functional-query/queries/QueryTest/nested-types-parquet-stats.test: http://gerrit.cloudera.org:8080/#/c/8480/2/testdata/workloads/functional-query/queries/QueryTest/nested-types-parquet-stats.test@9 PS2, Line 9: row_regex: .*NumStatsFilteredRowGroups: 2 .* While you're here, do you mind converting them to the aggregation(..) syntax? Then you could lift the restriction of 'num_nodes = 1' in test_nested_types.py. http://gerrit.cloudera.org:8080/#/c/8480/2/testdata/workloads/functional-query/queries/QueryTest/nested-types-parquet-stats.test@58 PS2, Line 58: where bottom.item < -2; This looks like a c&p error from the query above. Can you double check that the tests run as you expect them to? http://gerrit.cloudera.org:8080/#/c/8480/2/testdata/workloads/functional-query/queries/QueryTest/nested-types-parquet-stats.test@98 PS2, Line 98: where a.item.e < -10 This may be seem like an ignorant question, but doesn't this predicate make the bottom field required? In general, does having a predicate on a field mean that it must not be null? http://gerrit.cloudera.org:8080/#/c/8480/2/testdata/workloads/functional-query/queries/QueryTest/nested-types-parquet-stats.test@107 PS2, Line 107: left outer join c.nested_struct.c.d cn, cn.item a where a.item.e < -10; Same here, if a row group has no values in nested_struct.c.d.item.item.e that are < -10, then their rows will not show up in the result, no? -- To view, visit http://gerrit.cloudera.org:8080/8480 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I0c99e20cb080b504442cd5376ea3e046016158fe Gerrit-Change-Number: 8480 Gerrit-PatchSet: 2 Gerrit-Owner: Vuk Ercegovac <vercego...@cloudera.com> Gerrit-Reviewer: Lars Volker <l...@cloudera.com> Gerrit-Reviewer: Vuk Ercegovac <vercego...@cloudera.com> Gerrit-Comment-Date: Tue, 14 Nov 2017 18:54:08 +0000 Gerrit-HasComments: Yes