cloud-fan commented on a change in pull request #28137: [SPARK-31361][SQL] 
Rebase datetime in parquet/avro according to file written Spark version
URL: https://github.com/apache/spark/pull/28137#discussion_r404760772
 
 

 ##########
 File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala
 ##########
 @@ -2522,19 +2522,6 @@ object SQLConf {
       .booleanConf
       .createWithDefault(false)
 
-  val LEGACY_PARQUET_REBASE_DATETIME_IN_WRITE =
-    buildConf("spark.sql.legacy.parquet.rebaseDateTimeInWrite.enabled")
 
 Review comment:
   People would assume the storage systems to use the ISO standard calendar 
(proleptic Gregorian calendar) by default. The Parquet spec doesn't define the 
calendar explicitly but it refers to java 8 time API, which kind of implictly 
requires proleptic Gregorian calendar.
   
   Spark has this issue because we use java 7 time API before 3.0.
   
   Hive switched to the proleptic Gregorian calendar in 3.1, and AFAIK caused 
some troubles in other systems, which kind of proves that other systems are 
using the proleptic Gregorian calendar already.
   
   Impala has added a config `convert_legacy_hive_parquet_utc_timestamps` to 
work around it: https://issues.apache.org/jira/browse/IMPALA-3933
   
   Presto also hit issues when reading Hive tables with timestamp: 
https://github.com/prestodb/presto/issues/12180
   
   That why I think it's better to write datetime values without rebasing, as 
that's the correct data.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to