Alex Behm has posted comments on this change. ( http://gerrit.cloudera.org:8080/8775 )
Change subject: IMPALA-4993: extend dictionary filtering to collections ...................................................................... Patch Set 9: (9 comments) Change looks pretty good to me http://gerrit.cloudera.org:8080/#/c/8775/8/be/src/exec/hdfs-parquet-scanner.cc File be/src/exec/hdfs-parquet-scanner.cc: http://gerrit.cloudera.org:8080/#/c/8775/8/be/src/exec/hdfs-parquet-scanner.cc@778 PS8, Line 778: } > no, I intended to omit the collection readers. this partitions all nested s Makes sense. The function comment incorrectly states they are added to 'non_filtering'. http://gerrit.cloudera.org:8080/#/c/8775/8/be/src/exec/hdfs-parquet-scanner.cc@809 PS8, Line 809: if (!state_->query_options().parquet_dictionary_filtering || dict_filter_map_.empty()) { > yup... I assumed that if its a mix of collection + scalars is ok since that My point was about whether child readers are included or not. In this code path we only add the top-level readers. In the other code path we recursively collect all nested readers as well. I think we should either be consistent or add a comment explaining that in some cases only the top-level readers are added. http://gerrit.cloudera.org:8080/#/c/8775/8/be/src/exec/hdfs-parquet-scanner.cc@845 PS8, Line 845: // where some data pages are dictionary-encoded and others are plain > that's correct, removed. I think I carried that conservative check over fro Agree. Would be good to add a Preonditions check to the FE SlotDescriptor c'tors. http://gerrit.cloudera.org:8080/#/c/8775/8/be/src/exec/hdfs-scanner.h File be/src/exec/hdfs-scanner.h: http://gerrit.cloudera.org:8080/#/c/8775/8/be/src/exec/hdfs-scanner.h@222 PS8, Line 222: typedef std::map<SlotId, std::vector<ScalarExprEvaluator*>> > I'll comment here on the <tupleid, slotid> vs slotid decision here (covers Makes sense. Would be good to clarify this in TupleDescriptor and/or SlotDescriptor as you suggested. SlotIds and TupleIds are unique within a query. A SlotId belongs to exactly one TupleId. http://gerrit.cloudera.org:8080/#/c/8775/8/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java File fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java: http://gerrit.cloudera.org:8080/#/c/8775/8/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java@548 PS8, Line 548: private boolean isCollectionNotEmpty(TupleDescriptor desc) { > its used in a couple of places and I wanted to keep the same checks in both I'm still in favor of removing. Reasons: The first Preconditions check only seems useful in the context of this function, and even so you'll get the expected "false" return value if we probe for null. The second Preconditions check is not necessary because the type of a tuple is always StructType (see type_ member in TupleDescriptor). http://gerrit.cloudera.org:8080/#/c/8775/8/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java@1102 PS8, Line 1102: // in the same order as the normal conjuncts. Sort the indices so that the > Done Sorry this was my being vague. I meant creating a helper function for adding the dictionary filtering part of the explain string. It's a good chunk of code that deals with that part. I think the grouping an be done in that function and doesn't need a separate helper. http://gerrit.cloudera.org:8080/#/c/8775/8/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java@1116 PS8, Line 1116: for (Integer idx : totalIdxList) { > didn't want to mutate a class member for the purpose of printing diags. The perTupleConjuncts already have a new list, so no member is modified would be modified. http://gerrit.cloudera.org:8080/#/c/8775/9/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java File fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java: http://gerrit.cloudera.org:8080/#/c/8775/9/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java@639 PS9, Line 639: List<TupleId> tupleIds = Lists.newArrayList(); remove, Preconditions check in L646 must trivially be true after L645 http://gerrit.cloudera.org:8080/#/c/8775/8/testdata/workloads/functional-planner/queries/PlannerTest/parquet-filtering.test File testdata/workloads/functional-planner/queries/PlannerTest/parquet-filtering.test: http://gerrit.cloudera.org:8080/#/c/8775/8/testdata/workloads/functional-planner/queries/PlannerTest/parquet-filtering.test@330 PS8, Line 330: parquet statistics predicates: c_custkey > 0, o.o_orderkey > 0, l.l_partkey > 0 > hmm... actually I followed the pattern on L324-5 which separates the predic that pattern makes more sense to me as well, so the oddity is really an existing issue with the "parquet statistics predicates" -- To view, visit http://gerrit.cloudera.org:8080/8775 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: If3a2abcfc3d0f7d18756816659fed77ce12668dd Gerrit-Change-Number: 8775 Gerrit-PatchSet: 9 Gerrit-Owner: Vuk Ercegovac <[email protected]> Gerrit-Reviewer: Alex Behm <[email protected]> Gerrit-Reviewer: Lars Volker <[email protected]> Gerrit-Reviewer: Tim Armstrong <[email protected]> Gerrit-Reviewer: Vuk Ercegovac <[email protected]> Gerrit-Comment-Date: Fri, 12 Jan 2018 00:29:30 +0000 Gerrit-HasComments: Yes
