[GitHub] [spark] bersprockets commented on pull request #34712: [SPARK-37463][SQL] Read/Write Timestamp ntz or ltz to Orc uses UTC timestamp

GitBox Fri, 26 Nov 2021 16:04:43 -0800


bersprockets commented on pull request #34712:
URL: https://github.com/apache/spark/pull/34712#issuecomment-980471687



   Unfortunately, I don't think you can employ useUTCTimestamp=true to fix 
TIMESTAMP_NTZ without breaking compatibility for TIMESTAMP between versions. I 
could be wrong.
   
   The check for isOldOrcFile works fine, but only one way (reading old Spark 
from new Spark). The other way it does not work, for example:
   
   Write using Spark with this PR (running in a non-UTC timezone)
   ```
   sql("select timestamp '2021-06-01 00:00:00' 
ts").write.mode("overwrite").format("orc").save("/tmp/testdata/ts_orc_spark_use_utc")
   ```
   Read using Spark 3.2.0 (running in the same timezone as above):
   ```
   scala> sql("select * from 
`orc`.`/tmp/testdata/ts_orc_spark_use_utc`").show(false)
   +-------------------+
   |ts                 |
   +-------------------+
   |2021-05-31 22:00:00|
   +-------------------+
   
   scala> 
   ```
   That's why my [POC 
change](https://github.com/apache/spark/compare/master...bersprockets:orc_ntz_issue_play)
 did some whacky looking stuff.
   
   Even if it didn't break compatibility, it would be a behavior change between 
minor versions. I would think such a behavior change would need a config to 
toggle it.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [spark] bersprockets commented on pull request #34712: [SPARK-37463][SQL] Read/Write Timestamp ntz or ltz to Orc uses UTC timestamp

Reply via email to