Lars Volker has posted comments on this change. ( http://gerrit.cloudera.org:8080/8775 )
Change subject: IMPALA-4993: extend dictionary filtering to collections ...................................................................... Patch Set 1: (19 comments) The approach looks good to me, I mainly have some clarifying questions. http://gerrit.cloudera.org:8080/#/c/8775/1/be/src/exec/hdfs-parquet-scanner.h File be/src/exec/hdfs-parquet-scanner.h: http://gerrit.cloudera.org:8080/#/c/8775/1/be/src/exec/hdfs-parquet-scanner.h@452 PS1, Line 452: Only scalar columns can be : /// dictionary filtered Can you mention in the comment why? http://gerrit.cloudera.org:8080/#/c/8775/1/be/src/exec/hdfs-scan-node-base.h File be/src/exec/hdfs-scan-node-base.h: http://gerrit.cloudera.org:8080/#/c/8775/1/be/src/exec/hdfs-scan-node-base.h@167 PS1, Line 167: return thrift_dict_filter_conjuncts_; You could make the calling code cleaner by returning an empty vector here. You'd need have a static empty vector though, so I'm not sure it's worth it. http://gerrit.cloudera.org:8080/#/c/8775/1/be/src/exec/hdfs-scan-node-base.h@382 PS1, Line 382: /// Dictionary filtering eligible conjuncts for each slot. Can you add to the comment that this can be nullptr? http://gerrit.cloudera.org:8080/#/c/8775/1/be/src/exec/hdfs-scanner.cc File be/src/exec/hdfs-scanner.cc: http://gerrit.cloudera.org:8080/#/c/8775/1/be/src/exec/hdfs-scanner.cc@90 PS1, Line 90: auto We usually only use auto for iterators and const& in for loops. I think this would be more readable with the type spelled out, especially since it's not clear from its name what 'second' is. It seems to be std::vector<ScalarExprEvaluator*>, right? I also cannot figure out why you use a pointer here instead of a const ref to the vector. http://gerrit.cloudera.org:8080/#/c/8775/1/be/src/exec/hdfs-scanner.cc@97 PS1, Line 97: entry.tuple_id you could just use 'tuple_id' here (from L86). http://gerrit.cloudera.org:8080/#/c/8775/1/common/thrift/PlanNodes.thrift File common/thrift/PlanNodes.thrift: http://gerrit.cloudera.org:8080/#/c/8775/1/common/thrift/PlanNodes.thrift@202 PS1, Line 202: between a TupleId, SlotId pair to a nit: I think it should be "mapping between ... and ..." or "mapping from ... to ..."? I may be wrong though. http://gerrit.cloudera.org:8080/#/c/8775/1/common/thrift/PlanNodes.thrift@209 PS1, Line 209: optional Shouldn't they all be required? http://gerrit.cloudera.org:8080/#/c/8775/1/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java File fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java: http://gerrit.cloudera.org:8080/#/c/8775/1/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java@187 PS1, Line 187: // Map from pairs of TupleDescriptor, SlotIds to indices in PlanNodes.conjuncts_ and I think this comment could benefit from a description of what makes a slot/conjunct eligible (depending on a single slot). Overall I think it would help to have a paragraph that describes how dictionary filtering works but I'm not yet sure where it would fit best. http://gerrit.cloudera.org:8080/#/c/8775/1/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java@188 PS1, Line 188: collectionConjuncts nit: missing _ http://gerrit.cloudera.org:8080/#/c/8775/1/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java@188 PS1, Line 188: // collectionConjuncts that are eligible for dictionary filtering. The TupleDescriptor nit: I'd phrase it as "slots in the TupleDescriptor of this scan node map to indices...". http://gerrit.cloudera.org:8080/#/c/8775/1/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java@624 PS1, Line 624: Preconditions.checkState(tupleIds.size() == 1); It doesn't seem obvious to me why this is true. Can you add a comment? http://gerrit.cloudera.org:8080/#/c/8775/1/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java@656 PS1, Line 656: * Walks through conjuncts and collectionConjuncts and populates nit: trailing _s http://gerrit.cloudera.org:8080/#/c/8775/1/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java@668 PS1, Line 668: notEmptyCollections_.contains(entry.getKey()) This could be factored into its own method together with the comment above. I suspect the comment is hard to understand if you don't already know what it's about. Maybe add a reference to SelectStmt#registerIsNotEmptyPredicates as an additional breadcrumb? http://gerrit.cloudera.org:8080/#/c/8775/1/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java@1010 PS1, Line 1010: for (Map.Entry<Pair<TupleDescriptor, SlotId>, List<Integer>> entry : dictionaryFilterConjuncts_.entrySet()) { long line http://gerrit.cloudera.org:8080/#/c/8775/1/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java@1075 PS1, Line 1075: if (!dictionaryFilterConjuncts_.isEmpty()) { This check looks like it's not needed. http://gerrit.cloudera.org:8080/#/c/8775/1/testdata/workloads/functional-query/queries/QueryTest/parquet-filtering.test File testdata/workloads/functional-query/queries/QueryTest/parquet-filtering.test: http://gerrit.cloudera.org:8080/#/c/8775/1/testdata/workloads/functional-query/queries/QueryTest/parquet-filtering.test@a2 PS1, Line 2: Why is this not needed anymore? http://gerrit.cloudera.org:8080/#/c/8775/1/testdata/workloads/functional-query/queries/QueryTest/parquet-filtering.test@265 PS1, Line 265: # - number of projections and their nesting depth I suspect this comment may rot if we keep it here, but to me it looks like it could go into the commit message. http://gerrit.cloudera.org:8080/#/c/8775/1/testdata/workloads/functional-query/queries/QueryTest/parquet-filtering.test@275 PS1, Line 275: 1 Why is only one row group filtered out here and not 2? http://gerrit.cloudera.org:8080/#/c/8775/1/testdata/workloads/functional-query/queries/QueryTest/parquet-filtering.test@374 PS1, Line 374: # Array struct value at depth 3. Bottom not required. No matches. Multiple nested values projected. nit: long lines -- To view, visit http://gerrit.cloudera.org:8080/8775 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: If3a2abcfc3d0f7d18756816659fed77ce12668dd Gerrit-Change-Number: 8775 Gerrit-PatchSet: 1 Gerrit-Owner: Vuk Ercegovac <[email protected]> Gerrit-Reviewer: Lars Volker <[email protected]> Gerrit-Comment-Date: Sat, 09 Dec 2017 00:15:01 +0000 Gerrit-HasComments: Yes
