Lars Volker has posted comments on this change.

Change subject: IMPALA-2328: Read support for min/max Parquet statistics
......................................................................


Patch Set 6:

(14 comments)

Thanks for the review. Please see PS7.

http://gerrit.cloudera.org:8080/#/c/6032/6/be/src/exec/hdfs-parquet-scanner.h
File be/src/exec/hdfs-parquet-scanner.h:

Line 382:   /// Min/max statistics contexts, owned by instances of this class.
> they are not owned by the class instances themselves, they're allocated in 
Done


Line 386:   /// 'min_max_conjunct_ctxs_', which are used when evaluating 
parquet::Statistics.
> somewhat incomprehensible comment.
Done


Line 476:   Status EvaluateRowGroupStats(const parquet::RowGroup& row_group, 
bool* skip_row_group);
> you're not evaluating stats, you're evaluating predicates on the stats
Done


http://gerrit.cloudera.org:8080/#/c/6032/6/be/src/exec/parquet-column-stats.h
File be/src/exec/parquet-column-stats.h:

Line 31: enum class StatsField { MIN, MAX };
> move into ColumnStatsBase so it's clear what stats this is referring to.
The other two values are i64 null_count and i64 distinct_count, for which I 
think we should have proper getters, i.e. we don't need to consider their type 
and method of decoding when reading them. Should we prefer an enum-based getter 
for those over ColumnStatsBase::ReadNullCountFromThrift() and 
ReadDistinctCountFromThrift()?


Line 114:   static bool ReadFromThrift(const parquet::Statistics& thrift_stats, 
const StatsField& stats_field,
> move function bodies into parquet-column-stats-inl.h
Done


http://gerrit.cloudera.org:8080/#/c/6032/4/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java
File fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java:

Line 297:     collectionConjuncts_.clear();
> true, but those should be identical, given that we rewrite constants as lit
Constant rewriting is controlled by the enable_expr_rewrites query option, 
changing the code to filter on literals and turning the query option off 
disables statistics for predicates like "a < 3 - 3". There is a test for this 
in parquet_stats.test:L206 that would fail. Let me know if you prefer to switch 
to literals.


http://gerrit.cloudera.org:8080/#/c/6032/6/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java
File fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java:

Line 85:  * conjuncts, that are passed to the backend and will be evaluated 
against the
> "conjuncts that"
Done


Line 154:   private List<Expr> minMaxPlanConjuncts_ = Lists.newArrayList();
> "plan conjuncts" doesn't really mean anything. original conjuncts?
Done


Line 302:    * Builds a predicate to evaluate parquet::Statistics by copying 
'inputSlot' into
> "to evaluate against"
Done


Line 347:       // Only constant exprs can be evaluated against the 
parquet::Statistics. This
> drop "the"
Done


Line 349:       if (!constExpr.isConstant()) continue;
> as pointed out before, you shouldn't see non-literal constants here
See my reply in PS4, constant folding depends on the enable_expr_rewrites query 
option.


http://gerrit.cloudera.org:8080/#/c/6032/4/testdata/workloads/functional-planner/queries/PlannerTest/aggregation.test
File testdata/workloads/functional-planner/queries/PlannerTest/aggregation.test:

Line 1047:    runtime filters: RF000 -> c_custkey
> extended is fine.
Done


http://gerrit.cloudera.org:8080/#/c/6032/6/testdata/workloads/functional-planner/queries/PlannerTest/constant-folding.test
File 
testdata/workloads/functional-planner/queries/PlannerTest/constant-folding.test:

Line 1: # Test HDFS scan node.
> run this file in extended mode so you see the stats predicates.
Done


http://gerrit.cloudera.org:8080/#/c/6032/6/testdata/workloads/functional-query/queries/QueryTest/parquet_stats.test
File testdata/workloads/functional-query/queries/QueryTest/parquet_stats.test:

Line 206: # Test that without expr rewrite and thus without constant folding, 
constant exprs still
> what's the point of those tests?
See my comment regarding literal vs. constant exprs in PS4. These tests make 
sure that we don't break support for constant exprs.


-- 
To view, visit http://gerrit.cloudera.org:8080/6032
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I39b836165756fcf929c801048d91c50c8fdcdae4
Gerrit-PatchSet: 6
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Lars Volker <[email protected]>
Gerrit-Reviewer: Alex Behm <[email protected]>
Gerrit-Reviewer: Lars Volker <[email protected]>
Gerrit-Reviewer: Marcel Kornacker <[email protected]>
Gerrit-Reviewer: Matthew Jacobs <[email protected]>
Gerrit-Reviewer: Matthew Mulder <[email protected]>
Gerrit-Reviewer: Mostafa Mokhtar <[email protected]>
Gerrit-HasComments: Yes

Reply via email to