GitHub user dilipbiswal opened a pull request:
https://github.com/apache/spark/pull/15332
[SPARK-10634][SQL] Support Parquet logical type TIMESTAMP_MILLIS
## What changes were proposed in this pull request?
**Description** from JIRA
The TimestampType in Spark SQL is of microsecond precision. Ideally, we
should convert Spark SQL timestamp values into Parquet TIMESTAMP_MICROS. But
unfortunately parquet-mr hasn't supported it yet.
For the read path, we should be able to read TIMESTAMP_MILLIS Parquet
values and pad a 0 microsecond part to read values.
For the write path, currently we are writing timestamps as INT96, similar
to Impala and Hive. One alternative is that, we can have a separate SQL option
to let users be able to write Spark SQL timestamp values as TIMESTAMP_MILLIS.
Of course, in this way the microsecond part will be truncated.
## How was this patch tested?
Added new tests in ParquetQuerySuite and ParquetIOSuite
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/dilipbiswal/spark parquet-time-millis
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/spark/pull/15332.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #15332
----
commit 4e040e51a8ffa5d2196f5cf0966dc6a309191167
Author: Dilip Biswal <[email protected]>
Date: 2016-10-01T05:49:28Z
[SPARK-10634] Support Parquet logical type TIMESTAMP_MILLIS
----
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]