Lars Volker has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8775 )

Change subject: IMPALA-4993: extend dictionary filtering to collections
......................................................................


Patch Set 4:

(11 comments)

This looks good to me, I only added minor clean-up comments and will do a final 
pass on the next PS.

http://gerrit.cloudera.org:8080/#/c/8775/4/be/src/exec/hdfs-scan-node-base.h
File be/src/exec/hdfs-scan-node-base.h:

http://gerrit.cloudera.org:8080/#/c/8775/4/be/src/exec/hdfs-scan-node-base.h@90
PS4, Line 90:   /// sequence-based file
This looks like a rebase artifact. I often look at diffs between patchsets (and 
I think others do, too) so if possible, it helps to do a final rebase at the 
end instead of rebaseing in the middle. Sometimes rebases are inevitable, in 
which case it's fine of course.


http://gerrit.cloudera.org:8080/#/c/8775/4/be/src/exec/hdfs-scanner.cc
File be/src/exec/hdfs-scanner.cc:

http://gerrit.cloudera.org:8080/#/c/8775/4/be/src/exec/hdfs-scanner.cc@90
PS4, Line 90: std::
We usually drop the std:: prefix in cc files.


http://gerrit.cloudera.org:8080/#/c/8775/4/be/src/exec/hdfs-scanner.cc@96
PS4, Line 96: std::
same here.


http://gerrit.cloudera.org:8080/#/c/8775/1/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java
File fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java:

http://gerrit.cloudera.org:8080/#/c/8775/1/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java@187
PS1, Line 187:   // tuple. Uses a linked hash map for consistent display in 
explain.
> min-max pruning is discussed in the header, but dictionary pruning is not.
Thanks.


http://gerrit.cloudera.org:8080/#/c/8775/1/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java@624
PS1, Line 624:         addNotEmptyCollections(collectionConjuncts);
> A bit unclear to me as well. However, if I understand this literally, we on
Thanks for adding the comments. This way round it makes more sense. It's still 
not obvious to me why in the prior code and before the check for slotIds.size() 
!= 1, tupleIds.size() == 1 had to be true. There seems to be some guarantee 
that conjuncts in the scanners work on a single tuple only, which surprises me.


http://gerrit.cloudera.org:8080/#/c/8775/1/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java@656
PS1, Line 656:     // Check to see if the conjunct evaluates to true when the 
slot is NULL
> Done
nit: conjuncts still lacks the _ :)


http://gerrit.cloudera.org:8080/#/c/8775/4/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java
File fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java:

http://gerrit.cloudera.org:8080/#/c/8775/4/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java@111
PS4, Line 111: the
Duplicate word


http://gerrit.cloudera.org:8080/#/c/8775/4/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java@1099
PS4, Line 1099:         output.append(detailPrefix + "parquet statistics 
predicates: " +
This one compiles the string differently than the others, probably I did this 
myself. While you're here and if you don't mind, you could convert it to 
String.format().


http://gerrit.cloudera.org:8080/#/c/8775/4/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java@1116
PS4, Line 1116:         List<Integer> totalIdxList = Lists.newArrayList();
This is more of a guess, but can't you pass entry.getValue() to newArrayList?


http://gerrit.cloudera.org:8080/#/c/8775/1/testdata/workloads/functional-query/queries/QueryTest/parquet-filtering.test
File 
testdata/workloads/functional-query/queries/QueryTest/parquet-filtering.test:

http://gerrit.cloudera.org:8080/#/c/8775/1/testdata/workloads/functional-query/queries/QueryTest/parquet-filtering.test@265
PS1, Line 265: # - number of projections and their nesting depth
> I think there's some benefit to explicitly listing the areas tested/to-test
I understand your point, let's keep it here.


http://gerrit.cloudera.org:8080/#/c/8775/1/testdata/workloads/functional-query/queries/QueryTest/parquet-filtering.test@275
PS1, Line 275: 1
> there are two files in the table. one of them is dictionary encoded for int
That makes sense. Should we add a quick comment for the next person to come by 
("Only one of the files in this table is dictionary encoded")?



--
To view, visit http://gerrit.cloudera.org:8080/8775
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: If3a2abcfc3d0f7d18756816659fed77ce12668dd
Gerrit-Change-Number: 8775
Gerrit-PatchSet: 4
Gerrit-Owner: Vuk Ercegovac <vercego...@cloudera.com>
Gerrit-Reviewer: Lars Volker <l...@cloudera.com>
Gerrit-Reviewer: Vuk Ercegovac <vercego...@cloudera.com>
Gerrit-Comment-Date: Wed, 13 Dec 2017 20:39:00 +0000
Gerrit-HasComments: Yes

Reply via email to