Marcel Kornacker has posted comments on this change. Change subject: IMPALA-2328: Read support for min/max Parquet statistics ......................................................................
Patch Set 6: (14 comments) http://gerrit.cloudera.org:8080/#/c/6032/6/be/src/exec/hdfs-parquet-scanner.h File be/src/exec/hdfs-parquet-scanner.h: Line 382: /// Min/max statistics contexts, owned by instances of this class. they are not owned by the class instances themselves, they're allocated in state_->obj_pool() Line 386: /// 'min_max_conjunct_ctxs_', which are used when evaluating parquet::Statistics. somewhat incomprehensible comment. this is used only for a specific function, and then only to avoid the dynamic allocation overhead. point this out, right now it seems very mysterious. Line 476: Status EvaluateRowGroupStats(const parquet::RowGroup& row_group, bool* skip_row_group); you're not evaluating stats, you're evaluating predicates on the stats http://gerrit.cloudera.org:8080/#/c/6032/6/be/src/exec/parquet-column-stats.h File be/src/exec/parquet-column-stats.h: Line 31: enum class StatsField { MIN, MAX }; move into ColumnStatsBase so it's clear what stats this is referring to. also, add enums for all fields in Statistics. Line 114: static bool ReadFromThrift(const parquet::Statistics& thrift_stats, const StatsField& stats_field, move function bodies into parquet-column-stats-inl.h http://gerrit.cloudera.org:8080/#/c/6032/4/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java File fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java: Line 297: collectionConjuncts_.clear(); > My reasoning was that any constant expression could be evaluated against th true, but those should be identical, given that we rewrite constants as literals. http://gerrit.cloudera.org:8080/#/c/6032/6/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java File fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java: Line 85: * conjuncts, that are passed to the backend and will be evaluated against the "conjuncts that" Line 154: private List<Expr> minMaxPlanConjuncts_ = Lists.newArrayList(); "plan conjuncts" doesn't really mean anything. original conjuncts? Line 302: * Builds a predicate to evaluate parquet::Statistics by copying 'inputSlot' into "to evaluate against" Line 347: // Only constant exprs can be evaluated against the parquet::Statistics. This drop "the" Line 349: if (!constExpr.isConstant()) continue; as pointed out before, you shouldn't see non-literal constants here http://gerrit.cloudera.org:8080/#/c/6032/4/testdata/workloads/functional-planner/queries/PlannerTest/aggregation.test File testdata/workloads/functional-planner/queries/PlannerTest/aggregation.test: Line 1047: runtime filters: RF000 -> c_custkey > I changed the code to track the original predicates and display them instea extended is fine. yes, "parquet statistics" is more concrete. http://gerrit.cloudera.org:8080/#/c/6032/6/testdata/workloads/functional-planner/queries/PlannerTest/constant-folding.test File testdata/workloads/functional-planner/queries/PlannerTest/constant-folding.test: Line 1: # Test HDFS scan node. run this file in extended mode so you see the stats predicates. http://gerrit.cloudera.org:8080/#/c/6032/6/testdata/workloads/functional-query/queries/QueryTest/parquet_stats.test File testdata/workloads/functional-query/queries/QueryTest/parquet_stats.test: Line 206: # Test that without expr rewrite and thus without constant folding, constant exprs still what's the point of those tests? -- To view, visit http://gerrit.cloudera.org:8080/6032 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I39b836165756fcf929c801048d91c50c8fdcdae4 Gerrit-PatchSet: 6 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Lars Volker <[email protected]> Gerrit-Reviewer: Alex Behm <[email protected]> Gerrit-Reviewer: Lars Volker <[email protected]> Gerrit-Reviewer: Marcel Kornacker <[email protected]> Gerrit-Reviewer: Matthew Jacobs <[email protected]> Gerrit-Reviewer: Matthew Mulder <[email protected]> Gerrit-Reviewer: Mostafa Mokhtar <[email protected]> Gerrit-HasComments: Yes
