Vuk Ercegovac has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8480 )

Change subject: IMPALA-4985: use parquet stats of nested types for dynamic 
pruning
......................................................................


Patch Set 2:

(8 comments)

http://gerrit.cloudera.org:8080/#/c/8480/2//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/8480/2//COMMIT_MSG@13
PS2, Line 13: value
> Should this read "type"?
clarified


http://gerrit.cloudera.org:8080/#/c/8480/2//COMMIT_MSG@16
PS2, Line 16: the scalar value must be on a path
            : to the root of the nested value where every node on the path
            : is required
> I'm not sure I'm following the reasoning behind this. Please see my comment
reworded and included examples (in the tests) that I had trouble separating 
from cases that could indeed be pruned.


http://gerrit.cloudera.org:8080/#/c/8480/2/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java
File fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java:

http://gerrit.cloudera.org:8080/#/c/8480/2/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java@567
PS2, Line 567:       tryComputeMinMaxPredicate(analyzer, pred);
> nit: this looks like it could go on a single line now.
Done


http://gerrit.cloudera.org:8080/#/c/8480/2/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java@575
PS2, Line 575:         tryComputeMinMaxPredicate(analyzer, pred);
> nit: single line?
Done


http://gerrit.cloudera.org:8080/#/c/8480/2/testdata/workloads/functional-query/queries/QueryTest/nested-types-parquet-stats.test
File 
testdata/workloads/functional-query/queries/QueryTest/nested-types-parquet-stats.test:

http://gerrit.cloudera.org:8080/#/c/8480/2/testdata/workloads/functional-query/queries/QueryTest/nested-types-parquet-stats.test@9
PS2, Line 9: row_regex: .*NumStatsFilteredRowGroups: 2 .*
> While you're here, do you mind converting them to the aggregation(..) synta
done. didn't make that connection until you pointed it out.


http://gerrit.cloudera.org:8080/#/c/8480/2/testdata/workloads/functional-query/queries/QueryTest/nested-types-parquet-stats.test@58
PS2, Line 58: where bottom.item < -2;
> This looks like a c&p error from the query above. Can you double check that
latest update works.


http://gerrit.cloudera.org:8080/#/c/8480/2/testdata/workloads/functional-query/queries/QueryTest/nested-types-parquet-stats.test@98
PS2, Line 98: where a.item.e < -10
> This may be seem like an ignorant question, but doesn't this predicate make
good point-- seems that the current approach is conservative. this could be 
pruned, but we currently do not. I've added a testcase below that illustrates 
an example of a collection filter, with no !empty guard, for which pruning 
would be incorrect (tried it and it produces different results than we 
currently do). I've also added a comment in the HDFSScanNode to point out that 
the current approach is conservative.


http://gerrit.cloudera.org:8080/#/c/8480/2/testdata/workloads/functional-query/queries/QueryTest/nested-types-parquet-stats.test@107
PS2, Line 107: left outer join c.nested_struct.c.d cn, cn.item a where a.item.e 
< -10;
> Same here, if a row group has no values in nested_struct.c.d.item.item.e th
see comment above.



--
To view, visit http://gerrit.cloudera.org:8080/8480
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I0c99e20cb080b504442cd5376ea3e046016158fe
Gerrit-Change-Number: 8480
Gerrit-PatchSet: 2
Gerrit-Owner: Vuk Ercegovac <[email protected]>
Gerrit-Reviewer: Lars Volker <[email protected]>
Gerrit-Reviewer: Vuk Ercegovac <[email protected]>
Gerrit-Comment-Date: Wed, 15 Nov 2017 02:46:49 +0000
Gerrit-HasComments: Yes

Reply via email to