[GitHub] [spark] MaxGekk commented on pull request #35756: [SPARK-38437][SQL] Lenient serialization of datetime from datasource

GitBox Tue, 26 Jul 2022 01:46:48 -0700


MaxGekk commented on PR #35756:
URL: https://github.com/apache/spark/pull/35756#issuecomment-1195190895


   >  // In avro the date "0001-01-01 00:00:00" will be read as "0002-01-01 
00:00:00"
   
   Is it avro specific? Do you observe the same w/ parquet? One more thing, the 
config `AVRO_REBASE_MODE_IN_WRITE` is set to `EXCEPTION` by default, so, you 
should see the exception below since you are trying to write ancient timestamps:
   ```
   org.apache.spark.SparkUpgradeException: 
[INCONSISTENT_BEHAVIOR_CROSS_VERSION.WRITE_ANCIENT_DATETIME] You may get a 
different result due to the upgrading to Spark >= 3.0:
   writing dates before 1582-10-15 or timestamps before 1900-01-01T00:00:00Z
   into Avro files can be dangerous, as the files may be read by Spark 2.x
   or legacy versions of Hive later, which uses a legacy hybrid calendar that
   is different from Spark 3.0+'s Proleptic Gregorian calendar. See more
   details in SPARK-31404. You can set 
"spark.sql.avro.datetimeRebaseModeInWrite" to "LEGACY" to rebase the
   datetime values w.r.t. the calendar difference during writing, to get maximum
   interoperability. Or set the config to "CORRECTED" to write the datetime
   values as it is, if you are sure that the written files will only be read by
   Spark 3.0+ or other systems that use Proleptic Gregorian calendar.
   ```
   I don't see that you modified `spark.sql.avro.datetimeRebaseModeInWrite` in 
the test. That's weird, and should be fixed definitely. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [spark] MaxGekk commented on pull request #35756: [SPARK-38437][SQL] Lenient serialization of datetime from datasource

Reply via email to