Attila Jeges has uploaded a new patch set (#3). ( 
http://gerrit.cloudera.org:8080/13189 )

Change subject: IMPALA-7370: DATE: Read/Write to parquet.
......................................................................

IMPALA-7370: DATE: Read/Write to parquet.

This change is a follow-up to IMPALA-7368 and adds support for DATE
type to the parquet scanner/writer.

Parquet uses DATE logical type for dates. DATE logical type annotates
an INT32 that stores the number of days from the Unix epoch, 1 January
1970.

This representation introduces a parquet interoperability issue
between Impala and older versions of Hive:
- Before version 3.1, Hive used Julian calendar to represent dates
  up to 1582-10-05 and Gregorian calendar for dates starting with
  1582-10-15. Dates between 1582-10-05 and 1582-10-15 were lost.
- Impala uses proleptic Gregorian calendar, extending the Gregorian
  calendar backward to dates preceding its official introduction in
  1582-10-15.
This means that pre-1582-10-15 dates written to a parquet table by
Hive will be read back incorrectly by Impala and vice versa.

Note that Hive 3.1 switched to proleptic Gregorian calendar too, so
for Hive 3.1+ this is no longer an issue.

Change-Id: I67da03754531660bc8de3b6935580d46deae1814
---
M be/src/exec/hdfs-table-sink.cc
M be/src/exec/parquet/hdfs-parquet-scanner.cc
M be/src/exec/parquet/hdfs-parquet-table-writer.cc
M be/src/exec/parquet/parquet-column-readers.cc
M be/src/exec/parquet/parquet-column-stats.cc
M be/src/exec/parquet/parquet-column-stats.h
M be/src/exec/parquet/parquet-column-stats.inline.h
M be/src/exec/parquet/parquet-common.h
M be/src/exec/parquet/parquet-metadata-utils.cc
M be/src/util/bit-packing.cc
M common/thrift/generate_error_codes.py
M fe/src/main/java/org/apache/impala/analysis/ParquetHelper.java
M fe/src/main/java/org/apache/impala/catalog/HdfsFileFormat.java
M fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java
M testdata/data/README
A testdata/data/hive2_pre_gregorian.parquet
A testdata/data/out_of_range_date.parquet
M testdata/datasets/functional/schema_constraints.csv
A 
testdata/workloads/functional-query/queries/QueryTest/date-fileformat-support.test
D 
testdata/workloads/functional-query/queries/QueryTest/date-text-only-support.test
A testdata/workloads/functional-query/queries/QueryTest/out-of-range-date.test
M testdata/workloads/functional-query/queries/QueryTest/parquet-filtering.test
M testdata/workloads/functional-query/queries/QueryTest/parquet-stats.test
M tests/common/impala_connection.py
M tests/custom_cluster/test_parquet_page_index.py
M tests/query_test/test_date_queries.py
M tests/query_test/test_insert_parquet.py
M tests/query_test/test_scanners.py
28 files changed, 431 insertions(+), 148 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/89/13189/3
--
To view, visit http://gerrit.cloudera.org:8080/13189
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I67da03754531660bc8de3b6935580d46deae1814
Gerrit-Change-Number: 13189
Gerrit-PatchSet: 3
Gerrit-Owner: Attila Jeges <atti...@cloudera.com>
Gerrit-Reviewer: Attila Jeges <atti...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <csringho...@cloudera.com>
Gerrit-Reviewer: Gabor Kaszab <gaborkas...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <impala-public-jenk...@cloudera.com>

Reply via email to