[Impala-ASF-CR] IMPALA-4993: extend dictionary filtering to collections

Alex Behm (Code Review) Thu, 11 Jan 2018 16:30:06 -0800

Alex Behm has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8775 )


Change subject: IMPALA-4993: extend dictionary filtering to collections
......................................................................


Patch Set 9:

(9 comments)

Change looks pretty good to me

http://gerrit.cloudera.org:8080/#/c/8775/8/be/src/exec/hdfs-parquet-scanner.cc
File be/src/exec/hdfs-parquet-scanner.cc:

http://gerrit.cloudera.org:8080/#/c/8775/8/be/src/exec/hdfs-parquet-scanner.cc@778
PS8, Line 778:     }
> no, I intended to omit the collection readers. this partitions all nested s
Makes sense. The function comment incorrectly states they are added to 
'non_filtering'.


http://gerrit.cloudera.org:8080/#/c/8775/8/be/src/exec/hdfs-parquet-scanner.cc@809
PS8, Line 809:   if (!state_->query_options().parquet_dictionary_filtering || 
dict_filter_map_.empty()) {
> yup... I assumed that if its a mix of collection + scalars is ok since that
My point was about whether child readers are included or not. In this code path 
we only add the top-level readers. In the other code path we recursively 
collect all nested readers as well.

I think we should either be consistent or add a comment explaining that in some 
cases only the top-level readers are added.


http://gerrit.cloudera.org:8080/#/c/8775/8/be/src/exec/hdfs-parquet-scanner.cc@845
PS8, Line 845:   // where some data pages are dictionary-encoded and others are 
plain
> that's correct, removed. I think I carried that conservative check over fro
Agree. Would be good to add a Preonditions check to the FE SlotDescriptor 
c'tors.


http://gerrit.cloudera.org:8080/#/c/8775/8/be/src/exec/hdfs-scanner.h
File be/src/exec/hdfs-scanner.h:

http://gerrit.cloudera.org:8080/#/c/8775/8/be/src/exec/hdfs-scanner.h@222
PS8, Line 222:   typedef std::map<SlotId, std::vector<ScalarExprEvaluator*>>
> I'll comment here on the <tupleid, slotid> vs slotid decision here (covers
Makes sense.

Would be good to clarify this in TupleDescriptor and/or SlotDescriptor as you 
suggested.

SlotIds and TupleIds are unique within a query. A SlotId belongs to exactly one 
TupleId.


http://gerrit.cloudera.org:8080/#/c/8775/8/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java
File fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java:

http://gerrit.cloudera.org:8080/#/c/8775/8/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java@548
PS8, Line 548:   private boolean isCollectionNotEmpty(TupleDescriptor desc) {
> its used in a couple of places and I wanted to keep the same checks in both
I'm still in favor of removing. Reasons:

The first Preconditions check only seems useful in the context of this 
function, and even so you'll get the expected "false" return value if we probe 
for null.

The second Preconditions check is not necessary because the type of a tuple is 
always StructType (see type_ member in TupleDescriptor).


http://gerrit.cloudera.org:8080/#/c/8775/8/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java@1102
PS8, Line 1102:         // in the same order as the normal conjuncts. Sort the 
indices so that the
> Done
Sorry this was my being vague. I meant creating a helper function for adding 
the dictionary filtering part of the explain string. It's a good chunk of code 
that deals with that part.

I think the grouping an be done in that function and doesn't need a separate 
helper.


http://gerrit.cloudera.org:8080/#/c/8775/8/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java@1116
PS8, Line 1116:         for (Integer idx : totalIdxList) {
> didn't want to mutate a class member for the purpose of printing diags.
The perTupleConjuncts already have a new list, so no member is modified would 
be modified.


http://gerrit.cloudera.org:8080/#/c/8775/9/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java
File fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java:

http://gerrit.cloudera.org:8080/#/c/8775/9/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java@639
PS9, Line 639:     List<TupleId> tupleIds = Lists.newArrayList();
remove, Preconditions check in L646 must trivially be true after L645


http://gerrit.cloudera.org:8080/#/c/8775/8/testdata/workloads/functional-planner/queries/PlannerTest/parquet-filtering.test
File 
testdata/workloads/functional-planner/queries/PlannerTest/parquet-filtering.test:

http://gerrit.cloudera.org:8080/#/c/8775/8/testdata/workloads/functional-planner/queries/PlannerTest/parquet-filtering.test@330
PS8, Line 330:    parquet statistics predicates: c_custkey > 0, o.o_orderkey > 
0, l.l_partkey > 0
> hmm... actually I followed the pattern on L324-5 which separates the predic
that pattern makes more sense to me as well, so the oddity is really an 
existing issue with the "parquet statistics predicates"



--
To view, visit http://gerrit.cloudera.org:8080/8775
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: If3a2abcfc3d0f7d18756816659fed77ce12668dd
Gerrit-Change-Number: 8775
Gerrit-PatchSet: 9
Gerrit-Owner: Vuk Ercegovac <[email protected]>
Gerrit-Reviewer: Alex Behm <[email protected]>
Gerrit-Reviewer: Lars Volker <[email protected]>
Gerrit-Reviewer: Tim Armstrong <[email protected]>
Gerrit-Reviewer: Vuk Ercegovac <[email protected]>
Gerrit-Comment-Date: Fri, 12 Jan 2018 00:29:30 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-4993: extend dictionary filtering to collections

Reply via email to