Vuk Ercegovac has posted comments on this change. ( http://gerrit.cloudera.org:8080/8480 )
Change subject: IMPALA-4985: use parquet stats of nested types for dynamic pruning ...................................................................... Patch Set 1: (8 comments) http://gerrit.cloudera.org:8080/#/c/8480/1//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/8480/1//COMMIT_MSG@12 PS1, Line 12: A nested value is defined to : be on a path of one or more nested types that is rooted at a : table column. > I don't understand what that sentence means. Can you try to clarify the dis Done http://gerrit.cloudera.org:8080/#/c/8480/1/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java File fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java: http://gerrit.cloudera.org:8080/#/c/8480/1/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java@435 PS1, Line 435: // Checks if slot refers to an array "pos" pseudo-column. > Can you add a comment explaining why checking for getColumn() == null is no added a clarifying comment. when I stepped through this code, getColumn was casting the net too. Yes it caught references to "pos" but it also included all nested types as well, which is not the intent with this change. If expectations for getColumn are different, then I can look into why its not behaving as expected. http://gerrit.cloudera.org:8080/#/c/8480/1/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java@441 PS1, Line 441: isMapStruct > I think it would be clearer to add a isArrayStruct() method to CollectionSt Done http://gerrit.cloudera.org:8080/#/c/8480/1/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java@564 PS1, Line 564: > nit: the surrounding code seems to omit this space. Done http://gerrit.cloudera.org:8080/#/c/8480/1/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java@567 PS1, Line 567: for (Expr pred: entry.getValue()) { : if (pred instanceof BinaryPredicate) { : tryComputeBinaryMinMaxPredicate(analyzer, (BinaryPredicate) pred); : } else if (pred instanceof InPredicate) { : tryComputeInListMinMaxPredicate(analyzer, (InPredicate) pred); : } : } > This looks like a duplication of the above loop. Adding additional predicat good point, done. http://gerrit.cloudera.org:8080/#/c/8480/1/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java@1109 PS1, Line 1109: slot.getColumn() == null > Is this another check for a pos slot? hmm, looks like a proxy for estimating when a scan range will definitely be included. since nested types are currently not filtered, these columns will be scanned so it makes sense to increase the scanner estimate. for this patch, since we conservatively included possibly filtered scalar columns, we should treat nested types similarly, so it makes sense to me to leave the current logic as-is. http://gerrit.cloudera.org:8080/#/c/8480/1/testdata/workloads/functional-query/queries/QueryTest/nested-types-parquet-stats.test File testdata/workloads/functional-query/queries/QueryTest/nested-types-parquet-stats.test: http://gerrit.cloudera.org:8080/#/c/8480/1/testdata/workloads/functional-query/queries/QueryTest/nested-types-parquet-stats.test@141 PS1, Line 141: Basics test > I'm not sure I understand what Basics means. Can you elaborate? I think we yeah, that was confusing so removed it and clarified that these queries are distinguished from the previous ones since they are reading from the tpch data set. http://gerrit.cloudera.org:8080/#/c/8480/1/testdata/workloads/functional-query/queries/QueryTest/parquet-stats.test File testdata/workloads/functional-query/queries/QueryTest/parquet-stats.test: http://gerrit.cloudera.org:8080/#/c/8480/1/testdata/workloads/functional-query/queries/QueryTest/parquet-stats.test@460 PS1, Line 460: ==== > Does this remove the trailing newline? reverted... I had put tests here first so must have changed this file. -- To view, visit http://gerrit.cloudera.org:8080/8480 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I0c99e20cb080b504442cd5376ea3e046016158fe Gerrit-Change-Number: 8480 Gerrit-PatchSet: 1 Gerrit-Owner: Vuk Ercegovac <[email protected]> Gerrit-Reviewer: Lars Volker <[email protected]> Gerrit-Reviewer: Vuk Ercegovac <[email protected]> Gerrit-Comment-Date: Tue, 14 Nov 2017 18:07:21 +0000 Gerrit-HasComments: Yes
