cloud-fan commented on a change in pull request #34741:
URL: https://github.com/apache/spark/pull/34741#discussion_r758910943
##########
File path:
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/orc/OrcUtils.scala
##########
@@ -531,13 +533,16 @@ object OrcUtils extends Logging {
}
def fromOrcNTZ(ts: Timestamp): Long = {
- DateTimeUtils.millisToMicros(ts.getTime) +
+ val utcMicros = DateTimeUtils.millisToMicros(ts.getTime) +
(ts.getNanos / NANOS_PER_MICROS) % MICROS_PER_MILLIS
+ val micros = DateTimeUtils.fromUTCTime(utcMicros,
TimeZone.getDefault.getID)
+ micros
}
def toOrcNTZ(micros: Long): OrcTimestamp = {
- val seconds = Math.floorDiv(micros, MICROS_PER_SECOND)
- val nanos = (micros - seconds * MICROS_PER_SECOND) * NANOS_PER_MICROS
+ val utcMicros = DateTimeUtils.toUTCTime(micros, TimeZone.getDefault.getID)
Review comment:
I'm trying to understand this issue better. From the ORC source code,
seems like
1. ORC writer shifts the timestamp value w.r.t. the JVM local timezone, and
record the timezone in file footer
2. ORC reader shifts the timestamp value w.r.t. both the JVM local timezone
and the record writer timezone.
seems like we only need to change the ORC reader to shift the timestamp
value by writer timezone?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]