bersprockets commented on a change in pull request #34741:
URL: https://github.com/apache/spark/pull/34741#discussion_r758851983
##########
File path:
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/orc/OrcUtils.scala
##########
@@ -531,13 +533,16 @@ object OrcUtils extends Logging {
}
def fromOrcNTZ(ts: Timestamp): Long = {
- DateTimeUtils.millisToMicros(ts.getTime) +
+ val utcMicros = DateTimeUtils.millisToMicros(ts.getTime) +
(ts.getNanos / NANOS_PER_MICROS) % MICROS_PER_MILLIS
+ val micros = DateTimeUtils.fromUTCTime(utcMicros,
TimeZone.getDefault.getID)
+ micros
}
def toOrcNTZ(micros: Long): OrcTimestamp = {
- val seconds = Math.floorDiv(micros, MICROS_PER_SECOND)
- val nanos = (micros - seconds * MICROS_PER_SECOND) * NANOS_PER_MICROS
+ val utcMicros = DateTimeUtils.toUTCTime(micros, TimeZone.getDefault.getID)
Review comment:
There is at least one issue with dates before 1883-11-17. Railway time
zones didn't exist then, and java.time classes (which fromUTCTime/toUTCTime
use) care about that.
Also, I surmise there will be an additional issue with pre-Gregorian values,
since ORC assumes Julian when "shifting" values on read.
Even my POC has issues when the writer is in the Pacific/Pago_Pago time zone
and reads from some other times zone, because I also used the
fromUTCTime/toUTCTime utility functions.
When dealing with hybrid Julian times, ORC doesn't have issues shifting
values between time zones that didn't exist yet, since the old time-related
classes didn't worry about that.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]