Lars Volker has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8775 )

Change subject: IMPALA-4993: extend dictionary filtering to collections
......................................................................


Patch Set 1:

(19 comments)

The approach looks good to me, I mainly have some clarifying questions.

http://gerrit.cloudera.org:8080/#/c/8775/1/be/src/exec/hdfs-parquet-scanner.h
File be/src/exec/hdfs-parquet-scanner.h:

http://gerrit.cloudera.org:8080/#/c/8775/1/be/src/exec/hdfs-parquet-scanner.h@452
PS1, Line 452: Only scalar columns can be
             :   /// dictionary filtered
Can you mention in the comment why?


http://gerrit.cloudera.org:8080/#/c/8775/1/be/src/exec/hdfs-scan-node-base.h
File be/src/exec/hdfs-scan-node-base.h:

http://gerrit.cloudera.org:8080/#/c/8775/1/be/src/exec/hdfs-scan-node-base.h@167
PS1, Line 167:     return thrift_dict_filter_conjuncts_;
You could make the calling code cleaner by returning an empty vector here. 
You'd need have a static empty vector though, so I'm not sure it's worth it.


http://gerrit.cloudera.org:8080/#/c/8775/1/be/src/exec/hdfs-scan-node-base.h@382
PS1, Line 382:   /// Dictionary filtering eligible conjuncts for each slot.
Can you add to the comment that this can be nullptr?


http://gerrit.cloudera.org:8080/#/c/8775/1/be/src/exec/hdfs-scanner.cc
File be/src/exec/hdfs-scanner.cc:

http://gerrit.cloudera.org:8080/#/c/8775/1/be/src/exec/hdfs-scanner.cc@90
PS1, Line 90: auto
We usually only use auto for iterators and const& in for loops. I think this 
would be more readable with the type spelled out, especially since it's not 
clear from its name what 'second' is. It seems to be 
std::vector<ScalarExprEvaluator*>, right?

I also cannot figure out why you use a pointer here instead of a const ref to 
the vector.


http://gerrit.cloudera.org:8080/#/c/8775/1/be/src/exec/hdfs-scanner.cc@97
PS1, Line 97: entry.tuple_id
you could just use 'tuple_id' here (from L86).


http://gerrit.cloudera.org:8080/#/c/8775/1/common/thrift/PlanNodes.thrift
File common/thrift/PlanNodes.thrift:

http://gerrit.cloudera.org:8080/#/c/8775/1/common/thrift/PlanNodes.thrift@202
PS1, Line 202: between a TupleId, SlotId pair to a
nit: I think it should be "mapping between ... and ..." or "mapping from ... to 
..."? I may be wrong though.


http://gerrit.cloudera.org:8080/#/c/8775/1/common/thrift/PlanNodes.thrift@209
PS1, Line 209: optional
Shouldn't they all be required?


http://gerrit.cloudera.org:8080/#/c/8775/1/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java
File fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java:

http://gerrit.cloudera.org:8080/#/c/8775/1/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java@187
PS1, Line 187:   // Map from pairs of TupleDescriptor, SlotIds to indices in 
PlanNodes.conjuncts_ and
I think this comment could benefit from a description of what makes a 
slot/conjunct eligible (depending on a single slot). Overall I think it would 
help to have a paragraph that describes how dictionary filtering works but I'm 
not yet sure where it would fit best.


http://gerrit.cloudera.org:8080/#/c/8775/1/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java@188
PS1, Line 188: collectionConjuncts
nit: missing _


http://gerrit.cloudera.org:8080/#/c/8775/1/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java@188
PS1, Line 188:   // collectionConjuncts that are eligible for dictionary 
filtering. The TupleDescriptor
nit: I'd phrase it as "slots in the TupleDescriptor of this scan node map to 
indices...".


http://gerrit.cloudera.org:8080/#/c/8775/1/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java@624
PS1, Line 624:     Preconditions.checkState(tupleIds.size() == 1);
It doesn't seem obvious to me why this is true. Can you add a comment?


http://gerrit.cloudera.org:8080/#/c/8775/1/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java@656
PS1, Line 656:    * Walks through conjuncts and collectionConjuncts and 
populates
nit: trailing _s


http://gerrit.cloudera.org:8080/#/c/8775/1/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java@668
PS1, Line 668: notEmptyCollections_.contains(entry.getKey())
This could be factored into its own method together with the comment above. I 
suspect the comment is hard to understand if you don't already know what it's 
about. Maybe add a reference to SelectStmt#registerIsNotEmptyPredicates as an 
additional breadcrumb?


http://gerrit.cloudera.org:8080/#/c/8775/1/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java@1010
PS1, Line 1010:     for (Map.Entry<Pair<TupleDescriptor, SlotId>, 
List<Integer>> entry : dictionaryFilterConjuncts_.entrySet()) {
long line


http://gerrit.cloudera.org:8080/#/c/8775/1/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java@1075
PS1, Line 1075:       if (!dictionaryFilterConjuncts_.isEmpty()) {
This check looks like it's not needed.


http://gerrit.cloudera.org:8080/#/c/8775/1/testdata/workloads/functional-query/queries/QueryTest/parquet-filtering.test
File 
testdata/workloads/functional-query/queries/QueryTest/parquet-filtering.test:

http://gerrit.cloudera.org:8080/#/c/8775/1/testdata/workloads/functional-query/queries/QueryTest/parquet-filtering.test@a2
PS1, Line 2:
Why is this not needed anymore?


http://gerrit.cloudera.org:8080/#/c/8775/1/testdata/workloads/functional-query/queries/QueryTest/parquet-filtering.test@265
PS1, Line 265: # - number of projections and their nesting depth
I suspect this comment may rot if we keep it here, but to me it looks like it 
could go into the commit message.


http://gerrit.cloudera.org:8080/#/c/8775/1/testdata/workloads/functional-query/queries/QueryTest/parquet-filtering.test@275
PS1, Line 275: 1
Why is only one row group filtered out here and not 2?


http://gerrit.cloudera.org:8080/#/c/8775/1/testdata/workloads/functional-query/queries/QueryTest/parquet-filtering.test@374
PS1, Line 374: # Array struct value at depth 3. Bottom not required. No 
matches. Multiple nested values projected.
nit: long lines



--
To view, visit http://gerrit.cloudera.org:8080/8775
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: If3a2abcfc3d0f7d18756816659fed77ce12668dd
Gerrit-Change-Number: 8775
Gerrit-PatchSet: 1
Gerrit-Owner: Vuk Ercegovac <[email protected]>
Gerrit-Reviewer: Lars Volker <[email protected]>
Gerrit-Comment-Date: Sat, 09 Dec 2017 00:15:01 +0000
Gerrit-HasComments: Yes

Reply via email to