Lars Volker has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8480 )

Change subject: IMPALA-4985: use parquet stats of nested types for dynamic 
pruning
......................................................................


Patch Set 2:

(9 comments)

http://gerrit.cloudera.org:8080/#/c/8480/2//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/8480/2//COMMIT_MSG@13
PS2, Line 13: value
Should this read "type"?


http://gerrit.cloudera.org:8080/#/c/8480/2//COMMIT_MSG@16
PS2, Line 16: the scalar value must be on a path
            : to the root of the nested value where every node on the path
            : is required
I'm not sure I'm following the reasoning behind this. Please see my comments in 
the tests.


http://gerrit.cloudera.org:8080/#/c/8480/1/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java
File fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java:

http://gerrit.cloudera.org:8080/#/c/8480/1/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java@1109
PS1, Line 1109: tScanRanges = 0;
> hmm, looks like a proxy for estimating when a scan range will definitely be
Sounds good, thanks for the clarification.


http://gerrit.cloudera.org:8080/#/c/8480/2/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java
File fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java:

http://gerrit.cloudera.org:8080/#/c/8480/2/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java@567
PS2, Line 567:       tryComputeMinMaxPredicate(analyzer, pred);
nit: this looks like it could go on a single line now.


http://gerrit.cloudera.org:8080/#/c/8480/2/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java@575
PS2, Line 575:         tryComputeMinMaxPredicate(analyzer, pred);
nit: single line?


http://gerrit.cloudera.org:8080/#/c/8480/2/testdata/workloads/functional-query/queries/QueryTest/nested-types-parquet-stats.test
File 
testdata/workloads/functional-query/queries/QueryTest/nested-types-parquet-stats.test:

http://gerrit.cloudera.org:8080/#/c/8480/2/testdata/workloads/functional-query/queries/QueryTest/nested-types-parquet-stats.test@9
PS2, Line 9: row_regex: .*NumStatsFilteredRowGroups: 2 .*
While you're here, do you mind converting them to the aggregation(..) syntax? 
Then you could lift the restriction of 'num_nodes = 1' in test_nested_types.py.


http://gerrit.cloudera.org:8080/#/c/8480/2/testdata/workloads/functional-query/queries/QueryTest/nested-types-parquet-stats.test@58
PS2, Line 58: where bottom.item < -2;
This looks like a c&p error from the query above. Can you double check that the 
tests run as you expect them to?


http://gerrit.cloudera.org:8080/#/c/8480/2/testdata/workloads/functional-query/queries/QueryTest/nested-types-parquet-stats.test@98
PS2, Line 98: where a.item.e < -10
This may be seem like an ignorant question, but doesn't this predicate make the 
bottom field required? In general, does having a predicate on a field mean that 
it must not be null?


http://gerrit.cloudera.org:8080/#/c/8480/2/testdata/workloads/functional-query/queries/QueryTest/nested-types-parquet-stats.test@107
PS2, Line 107: left outer join c.nested_struct.c.d cn, cn.item a where a.item.e 
< -10;
Same here, if a row group has no values in nested_struct.c.d.item.item.e that 
are < -10, then their rows will not show up in the result, no?



--
To view, visit http://gerrit.cloudera.org:8080/8480
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I0c99e20cb080b504442cd5376ea3e046016158fe
Gerrit-Change-Number: 8480
Gerrit-PatchSet: 2
Gerrit-Owner: Vuk Ercegovac <vercego...@cloudera.com>
Gerrit-Reviewer: Lars Volker <l...@cloudera.com>
Gerrit-Reviewer: Vuk Ercegovac <vercego...@cloudera.com>
Gerrit-Comment-Date: Tue, 14 Nov 2017 18:54:08 +0000
Gerrit-HasComments: Yes

Reply via email to