Lars Volker has posted comments on this change. ( http://gerrit.cloudera.org:8080/8775 )
Change subject: IMPALA-4993: extend dictionary filtering to collections ...................................................................... Patch Set 4: (11 comments) This looks good to me, I only added minor clean-up comments and will do a final pass on the next PS. http://gerrit.cloudera.org:8080/#/c/8775/4/be/src/exec/hdfs-scan-node-base.h File be/src/exec/hdfs-scan-node-base.h: http://gerrit.cloudera.org:8080/#/c/8775/4/be/src/exec/hdfs-scan-node-base.h@90 PS4, Line 90: /// sequence-based file This looks like a rebase artifact. I often look at diffs between patchsets (and I think others do, too) so if possible, it helps to do a final rebase at the end instead of rebaseing in the middle. Sometimes rebases are inevitable, in which case it's fine of course. http://gerrit.cloudera.org:8080/#/c/8775/4/be/src/exec/hdfs-scanner.cc File be/src/exec/hdfs-scanner.cc: http://gerrit.cloudera.org:8080/#/c/8775/4/be/src/exec/hdfs-scanner.cc@90 PS4, Line 90: std:: We usually drop the std:: prefix in cc files. http://gerrit.cloudera.org:8080/#/c/8775/4/be/src/exec/hdfs-scanner.cc@96 PS4, Line 96: std:: same here. http://gerrit.cloudera.org:8080/#/c/8775/1/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java File fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java: http://gerrit.cloudera.org:8080/#/c/8775/1/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java@187 PS1, Line 187: // tuple. Uses a linked hash map for consistent display in explain. > min-max pruning is discussed in the header, but dictionary pruning is not. Thanks. http://gerrit.cloudera.org:8080/#/c/8775/1/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java@624 PS1, Line 624: addNotEmptyCollections(collectionConjuncts); > A bit unclear to me as well. However, if I understand this literally, we on Thanks for adding the comments. This way round it makes more sense. It's still not obvious to me why in the prior code and before the check for slotIds.size() != 1, tupleIds.size() == 1 had to be true. There seems to be some guarantee that conjuncts in the scanners work on a single tuple only, which surprises me. http://gerrit.cloudera.org:8080/#/c/8775/1/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java@656 PS1, Line 656: // Check to see if the conjunct evaluates to true when the slot is NULL > Done nit: conjuncts still lacks the _ :) http://gerrit.cloudera.org:8080/#/c/8775/4/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java File fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java: http://gerrit.cloudera.org:8080/#/c/8775/4/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java@111 PS4, Line 111: the Duplicate word http://gerrit.cloudera.org:8080/#/c/8775/4/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java@1099 PS4, Line 1099: output.append(detailPrefix + "parquet statistics predicates: " + This one compiles the string differently than the others, probably I did this myself. While you're here and if you don't mind, you could convert it to String.format(). http://gerrit.cloudera.org:8080/#/c/8775/4/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java@1116 PS4, Line 1116: List<Integer> totalIdxList = Lists.newArrayList(); This is more of a guess, but can't you pass entry.getValue() to newArrayList? http://gerrit.cloudera.org:8080/#/c/8775/1/testdata/workloads/functional-query/queries/QueryTest/parquet-filtering.test File testdata/workloads/functional-query/queries/QueryTest/parquet-filtering.test: http://gerrit.cloudera.org:8080/#/c/8775/1/testdata/workloads/functional-query/queries/QueryTest/parquet-filtering.test@265 PS1, Line 265: # - number of projections and their nesting depth > I think there's some benefit to explicitly listing the areas tested/to-test I understand your point, let's keep it here. http://gerrit.cloudera.org:8080/#/c/8775/1/testdata/workloads/functional-query/queries/QueryTest/parquet-filtering.test@275 PS1, Line 275: 1 > there are two files in the table. one of them is dictionary encoded for int That makes sense. Should we add a quick comment for the next person to come by ("Only one of the files in this table is dictionary encoded")? -- To view, visit http://gerrit.cloudera.org:8080/8775 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: If3a2abcfc3d0f7d18756816659fed77ce12668dd Gerrit-Change-Number: 8775 Gerrit-PatchSet: 4 Gerrit-Owner: Vuk Ercegovac <vercego...@cloudera.com> Gerrit-Reviewer: Lars Volker <l...@cloudera.com> Gerrit-Reviewer: Vuk Ercegovac <vercego...@cloudera.com> Gerrit-Comment-Date: Wed, 13 Dec 2017 20:39:00 +0000 Gerrit-HasComments: Yes