GitHub user squito opened a pull request:
https://github.com/apache/spark/pull/19769
[SPARK-12297][SQL] Adjust timezone for int96 data from impala
## What changes were proposed in this pull request?
Int96 data written by impala vs data written by hive & spark is stored
slightly differently -- they use a different offset for the timezone. This
adds an option "spark.sql.parquet.int96TimestampConversion" (false by default)
to adjust timestamps if and only if the writer is impala (or more precisely, if
the parquet file's "createdBy" metadata does not start with "parquet-mr").
This matches the existing behavior in hive from HIVE-9482.
## How was this patch tested?
Unit test added, existing tests run via jenkins.
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/squito/spark SPARK-12297_skip_conversion
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/spark/pull/19769.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #19769
----
commit e36453c2118fa3c45f424536ea787a95f0328687
Author: Imran Rashid <[email protected]>
Date: 2017-11-15T18:44:10Z
wip
commit 2c71453cead0025deaeadddbbad1c3a79e49a99f
Author: Imran Rashid <[email protected]>
Date: 2017-11-15T22:48:18Z
test and a bunch of plumbing in place
commit 592663393686d31a1d759e0e69540c10ee99dc69
Author: Imran Rashid <[email protected]>
Date: 2017-11-16T18:38:10Z
works, needs cleanup
commit e9ecd16defd82e9a33fc31fa200f961d2dad9d2e
Author: Imran Rashid <[email protected]>
Date: 2017-11-16T19:35:24Z
cleanup, test for predicate pushdown
----
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]