Lars Volker has posted comments on this change. ( http://gerrit.cloudera.org:8080/8480 )
Change subject: IMPALA-4985: use parquet stats of nested types for dynamic pruning ...................................................................... Patch Set 1: (8 comments) http://gerrit.cloudera.org:8080/#/c/8480/1//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/8480/1//COMMIT_MSG@12 PS1, Line 12: A nested value is defined to : be on a path of one or more nested types that is rooted at a : table column. I don't understand what that sentence means. Can you try to clarify the distinction between nested value and nested type? http://gerrit.cloudera.org:8080/#/c/8480/1/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java File fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java: http://gerrit.cloudera.org:8080/#/c/8480/1/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java@435 PS1, Line 435: // Checks if slot refers to an array "pos" pseudo-column. Can you add a comment explaining why checking for getColumn() == null is not sufficient? http://gerrit.cloudera.org:8080/#/c/8480/1/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java@441 PS1, Line 441: isMapStruct I think it would be clearer to add a isArrayStruct() method to CollectionStructType to emphasize that that's what we're checking. http://gerrit.cloudera.org:8080/#/c/8480/1/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java@564 PS1, Line 564: nit: the surrounding code seems to omit this space. http://gerrit.cloudera.org:8080/#/c/8480/1/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java@567 PS1, Line 567: for (Expr pred: entry.getValue()) { : if (pred instanceof BinaryPredicate) { : tryComputeBinaryMinMaxPredicate(analyzer, (BinaryPredicate) pred); : } else if (pred instanceof InPredicate) { : tryComputeInListMinMaxPredicate(analyzer, (InPredicate) pred); : } : } This looks like a duplication of the above loop. Adding additional predicates in the future may require changing both loops. Have you considered factoring it into it's own method? http://gerrit.cloudera.org:8080/#/c/8480/1/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java@1109 PS1, Line 1109: slot.getColumn() == null Is this another check for a pos slot? http://gerrit.cloudera.org:8080/#/c/8480/1/testdata/workloads/functional-query/queries/QueryTest/nested-types-parquet-stats.test File testdata/workloads/functional-query/queries/QueryTest/nested-types-parquet-stats.test: http://gerrit.cloudera.org:8080/#/c/8480/1/testdata/workloads/functional-query/queries/QueryTest/nested-types-parquet-stats.test@141 PS1, Line 141: Basics test I'm not sure I understand what Basics means. Can you elaborate? I think we often order tests by ascending complexity so that the simpler ones fail before the complex ones. http://gerrit.cloudera.org:8080/#/c/8480/1/testdata/workloads/functional-query/queries/QueryTest/parquet-stats.test File testdata/workloads/functional-query/queries/QueryTest/parquet-stats.test: http://gerrit.cloudera.org:8080/#/c/8480/1/testdata/workloads/functional-query/queries/QueryTest/parquet-stats.test@460 PS1, Line 460: ==== Does this remove the trailing newline? -- To view, visit http://gerrit.cloudera.org:8080/8480 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I0c99e20cb080b504442cd5376ea3e046016158fe Gerrit-Change-Number: 8480 Gerrit-PatchSet: 1 Gerrit-Owner: Vuk Ercegovac <[email protected]> Gerrit-Reviewer: Lars Volker <[email protected]> Gerrit-Comment-Date: Tue, 14 Nov 2017 00:10:08 +0000 Gerrit-HasComments: Yes
