Lars Volker has uploaded a new patch set (#4).

Change subject: IMPALA-4815, IMPALA-4817, IMPALA-4819: Populate Parquet 
Statistics for remaining types
......................................................................

IMPALA-4815, IMPALA-4817, IMPALA-4819: Populate Parquet Statistics for 
remaining types

This change adds functionality to write parquet::Statistics for Decimal,
String, and Timestamp values.

It also switches from using the deprecated fields 'min' and 'max' to
populate the new fields 'min_value' and 'max_value' in
parquet::Statistics, that were added in parquet-format PR change.

The HdfsParquetScanner will preferably read the new fields if they are
populated. For tables with only the old fields populated, it will read
them only if they are of simple numeric type, i.e. boolean, integer, or
floating point.

This change removes the comparison of the Parquet Statistics we write to
Hive from the tests, since Hive does not write the new fields. Instead
it adds a parquet file written by Hive that uses the deprecated fields
for its statistics. It exercises the fallback logic for supported in a
test using that file.

Change-Id: I3ef4a5d25a57c82577fd498d6d1c4297ecf39312
---
M be/src/exec/hdfs-parquet-scanner.cc
M be/src/exec/hdfs-parquet-table-writer.cc
M be/src/exec/parquet-column-stats.cc
M be/src/exec/parquet-column-stats.h
M be/src/exec/parquet-column-stats.inline.h
M be/src/exec/parquet-metadata-utils.cc
M be/src/exec/parquet-metadata-utils.h
M common/thrift/parquet.thrift
M testdata/data/README
A testdata/data/deprecated_statistics.parquet
A 
testdata/workloads/functional-query/queries/QueryTest/parquet-deprecated-stats.test
M testdata/workloads/functional-query/queries/QueryTest/parquet_stats.test
M tests/query_test/test_insert_parquet.py
M tests/query_test/test_parquet_stats.py
14 files changed, 686 insertions(+), 187 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/63/6563/4
-- 
To view, visit http://gerrit.cloudera.org:8080/6563
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I3ef4a5d25a57c82577fd498d6d1c4297ecf39312
Gerrit-PatchSet: 4
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Lars Volker <l...@cloudera.com>
Gerrit-Reviewer: Lars Volker <l...@cloudera.com>
Gerrit-Reviewer: Marcel Kornacker <mar...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <tarmstr...@cloudera.com>

Reply via email to