Lars Volker has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8480 )

Change subject: IMPALA-4985: use parquet stats of nested types for dynamic 
pruning
......................................................................


Patch Set 1:

(8 comments)

http://gerrit.cloudera.org:8080/#/c/8480/1//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/8480/1//COMMIT_MSG@12
PS1, Line 12: A nested value is defined to
            : be on a path of one or more nested types that is rooted at a
            : table column.
I don't understand what that sentence means. Can you try to clarify the 
distinction between nested value and nested type?


http://gerrit.cloudera.org:8080/#/c/8480/1/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java
File fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java:

http://gerrit.cloudera.org:8080/#/c/8480/1/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java@435
PS1, Line 435:   // Checks if slot refers to an array "pos" pseudo-column.
Can you add a comment explaining why checking for getColumn() == null is not 
sufficient?


http://gerrit.cloudera.org:8080/#/c/8480/1/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java@441
PS1, Line 441: isMapStruct
I think it would be clearer to add a isArrayStruct() method to 
CollectionStructType to emphasize that that's what we're checking.


http://gerrit.cloudera.org:8080/#/c/8480/1/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java@564
PS1, Line 564:
nit: the surrounding code seems to omit this space.


http://gerrit.cloudera.org:8080/#/c/8480/1/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java@567
PS1, Line 567:       for (Expr pred: entry.getValue()) {
             :         if (pred instanceof BinaryPredicate) {
             :           tryComputeBinaryMinMaxPredicate(analyzer, 
(BinaryPredicate) pred);
             :         } else if (pred instanceof InPredicate) {
             :           tryComputeInListMinMaxPredicate(analyzer, 
(InPredicate) pred);
             :         }
             :       }
This looks like a duplication of the above loop. Adding additional predicates 
in the future may require changing both loops. Have you considered factoring it 
into it's own method?


http://gerrit.cloudera.org:8080/#/c/8480/1/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java@1109
PS1, Line 1109: slot.getColumn() == null
Is this another check for a pos slot?


http://gerrit.cloudera.org:8080/#/c/8480/1/testdata/workloads/functional-query/queries/QueryTest/nested-types-parquet-stats.test
File 
testdata/workloads/functional-query/queries/QueryTest/nested-types-parquet-stats.test:

http://gerrit.cloudera.org:8080/#/c/8480/1/testdata/workloads/functional-query/queries/QueryTest/nested-types-parquet-stats.test@141
PS1, Line 141: Basics test
I'm not sure I understand what Basics means. Can you elaborate? I think we 
often order tests by ascending complexity so that the simpler ones fail before 
the complex ones.


http://gerrit.cloudera.org:8080/#/c/8480/1/testdata/workloads/functional-query/queries/QueryTest/parquet-stats.test
File testdata/workloads/functional-query/queries/QueryTest/parquet-stats.test:

http://gerrit.cloudera.org:8080/#/c/8480/1/testdata/workloads/functional-query/queries/QueryTest/parquet-stats.test@460
PS1, Line 460: ====
Does this remove the trailing newline?



--
To view, visit http://gerrit.cloudera.org:8080/8480
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I0c99e20cb080b504442cd5376ea3e046016158fe
Gerrit-Change-Number: 8480
Gerrit-PatchSet: 1
Gerrit-Owner: Vuk Ercegovac <[email protected]>
Gerrit-Reviewer: Lars Volker <[email protected]>
Gerrit-Comment-Date: Tue, 14 Nov 2017 00:10:08 +0000
Gerrit-HasComments: Yes

Reply via email to