to Orc uses UTC time zone

GitBox Mon, 29 Nov 2021 19:51:53 -0800


cloud-fan commented on a change in pull request #34741:
URL: https://github.com/apache/spark/pull/34741#discussion_r758910943




##########
File path: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/orc/OrcUtils.scala
##########
@@ -531,13 +533,16 @@ object OrcUtils extends Logging {
   }
 
   def fromOrcNTZ(ts: Timestamp): Long = {
-    DateTimeUtils.millisToMicros(ts.getTime) +
+    val utcMicros = DateTimeUtils.millisToMicros(ts.getTime) +
       (ts.getNanos / NANOS_PER_MICROS) % MICROS_PER_MILLIS
+    val micros = DateTimeUtils.fromUTCTime(utcMicros, 
TimeZone.getDefault.getID)
+    micros
   }
 
   def toOrcNTZ(micros: Long): OrcTimestamp = {
-    val seconds = Math.floorDiv(micros, MICROS_PER_SECOND)
-    val nanos = (micros - seconds * MICROS_PER_SECOND) * NANOS_PER_MICROS
+    val utcMicros = DateTimeUtils.toUTCTime(micros, TimeZone.getDefault.getID)

Review comment:
       I'm trying to understand this issue better. From the ORC source code, 
seems like
   1. ORC writer shifts the timestamp value w.r.t. the JVM local timezone, and 
record the timezone in file footer
   2. ORC reader shifts the timestamp value w.r.t. both the JVM local timezone 
and the record writer timezone.
   
   seems like we only need to change the ORC reader to shift the timestamp 
value by writer timezone?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] cloud-fan commented on a change in pull request #34741: [SPARK-37463][SQL] Read/Write Timestamp ntz from/to Orc uses UTC time zone

Reply via email to