MaxGekk commented on PR #35756: URL: https://github.com/apache/spark/pull/35756#issuecomment-1195190895
> // In avro the date "0001-01-01 00:00:00" will be read as "0002-01-01 00:00:00" Is it avro specific? Do you observe the same w/ parquet? One more thing, the config `AVRO_REBASE_MODE_IN_WRITE` is set to `EXCEPTION` by default, so, you should see the exception below since you are trying to write ancient timestamps: ``` org.apache.spark.SparkUpgradeException: [INCONSISTENT_BEHAVIOR_CROSS_VERSION.WRITE_ANCIENT_DATETIME] You may get a different result due to the upgrading to Spark >= 3.0: writing dates before 1582-10-15 or timestamps before 1900-01-01T00:00:00Z into Avro files can be dangerous, as the files may be read by Spark 2.x or legacy versions of Hive later, which uses a legacy hybrid calendar that is different from Spark 3.0+'s Proleptic Gregorian calendar. See more details in SPARK-31404. You can set "spark.sql.avro.datetimeRebaseModeInWrite" to "LEGACY" to rebase the datetime values w.r.t. the calendar difference during writing, to get maximum interoperability. Or set the config to "CORRECTED" to write the datetime values as it is, if you are sure that the written files will only be read by Spark 3.0+ or other systems that use Proleptic Gregorian calendar. ``` I don't see that you modified `spark.sql.avro.datetimeRebaseModeInWrite` in the test. That's weird, and should be fixed definitely. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
