[GitHub] [spark] beliefer opened a new pull request #34769: [SPARK-37463][SQL] Read/Write Timestamp ntz to Orc uses UTC timestamp

GitBox Wed, 01 Dec 2021 04:17:09 -0800


beliefer opened a new pull request #34769:
URL: https://github.com/apache/spark/pull/34769

### What changes were proposed in this pull request?
This PR used to fix the issue
https://github.com/apache/spark/pull/33588#issuecomment-978719988

The root cause is Orc write/read timestamp with local timezone in default.
The local timezone will be changed.
If the Orc writer write timestamp with local timezone(e.g.
America/Los_Angeles), when the Orc reader reading the timestamp with local
timezone(e.g. Europe/Amsterdam), the value of timestamp will be different.

If we let the Orc writer write timestamp with UTC timezone, when the Orc
reader reading the timestamp with UTC timezone too, the value of timestamp
will be correct.

This PR let Orc write/read Timestamp with UTC timezone by call
`useUTCTimestamp(true)` for readers or writers.

The related Orc source:

https://github.com/apache/orc/blob/3f1e57cf1cebe58027c1bd48c09eef4e9717a9e3/java/core/src/java/org/apache/orc/impl/WriterImpl.java#L525

https://github.com/apache/orc/blob/1f68ac0c7f2ae804b374500dcf1b4d7abe30ffeb/java/core/src/java/org/apache/orc/impl/TreeReaderFactory.java#L1184

Another problem is Spark 3.3 or newer read the Orc file written by Spark 3.2
or prior. Because the older Spark write timestamp with local timezone, no need
to read them with UTC timezone. Otherwise, an incorrect value of timestamp
occurs.

### Why are the changes needed?
Fix the bug for Orc timestamp.

### Does this PR introduce _any_ user-facing change?
Orc timestamp ntz is a new feature not release yet.

### How was this patch tested?
New tests.

--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [spark] beliefer opened a new pull request #34769: [SPARK-37463][SQL] Read/Write Timestamp ntz to Orc uses UTC timestamp

Reply via email to