cloud-fan commented on a change in pull request #28137: [SPARK-31361][SQL]
Rebase datetime in parquet/avro according to file written Spark version
URL: https://github.com/apache/spark/pull/28137#discussion_r404760772
##########
File path:
sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala
##########
@@ -2522,19 +2522,6 @@ object SQLConf {
.booleanConf
.createWithDefault(false)
- val LEGACY_PARQUET_REBASE_DATETIME_IN_WRITE =
- buildConf("spark.sql.legacy.parquet.rebaseDateTimeInWrite.enabled")
Review comment:
People would assume the storage systems to use the ISO standard calendar
(proleptic Gregorian calendar) by default. The Parquet spec doesn't define the
calendar explicitly but it refers to java 8 time API, which kind of implictly
requires proleptic Gregorian calendar.
Spark has this issue because we use java 7 time API before 3.0.
Hive switched to the proleptic Gregorian calendar in 3.1, and AFAIK caused
some troubles in other systems, which kind of proves that other systems are
using the proleptic Gregorian calendar already.
Impala has added a config `convert_legacy_hive_parquet_utc_timestamps` to
work around it: https://issues.apache.org/jira/browse/IMPALA-3933
Presto also hit issues when reading Hive tables with timestamp:
https://github.com/prestodb/presto/issues/12180
That why I think it's better to write datetime values without rebasing, as
that's the correct data.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]