klaus-xiong commented on PR #35756:
URL: https://github.com/apache/spark/pull/35756#issuecomment-1195199943
> > // In avro the date "0001-01-01 00:00:00" will be read as "0002-01-01
00:00:00"
>
> Is it avro specific? Do you observe the same w/ parquet? One more thing,
the config `AVRO_REBASE_MODE_IN_WRITE` is set to `EXCEPTION` by default, so,
you should see the exception below since you are trying to write ancient
timestamps:
>
> ```
> org.apache.spark.SparkUpgradeException:
[INCONSISTENT_BEHAVIOR_CROSS_VERSION.WRITE_ANCIENT_DATETIME] You may get a
different result due to the upgrading to Spark >= 3.0:
> writing dates before 1582-10-15 or timestamps before 1900-01-01T00:00:00Z
> into Avro files can be dangerous, as the files may be read by Spark 2.x
> or legacy versions of Hive later, which uses a legacy hybrid calendar that
> is different from Spark 3.0+'s Proleptic Gregorian calendar. See more
> details in SPARK-31404. You can set
"spark.sql.avro.datetimeRebaseModeInWrite" to "LEGACY" to rebase the
> datetime values w.r.t. the calendar difference during writing, to get
maximum
> interoperability. Or set the config to "CORRECTED" to write the datetime
> values as it is, if you are sure that the written files will only be read
by
> Spark 3.0+ or other systems that use Proleptic Gregorian calendar.
> ```
>
> I don't see that you modified `spark.sql.avro.datetimeRebaseModeInWrite`
in the test. That's weird, and should be fixed definitely.
ok, i will do more test about the file format and data, Also the config
about the avro and parquet.
And one more thing, is the below result is correct in SQL?
```
== Results ==
!== Correct Answer - 6 == == Spark Answer - 6 ==
struct<a:string> struct<a:string>
![1000-01-01 00:00:00.123] [1000-01-01 08:00:00.123]
![1582-10-15 08:00:00.456] [1582-10-15 16:00:00.456]
![1883-11-18 19:59:59.999] [1883-11-19 03:59:59.999]
![1883-11-18 20:00:00.001] [1883-11-19 04:00:00.001]
![1900-11-18 20:00:00.789] [1900-11-19 04:00:00.789]
![1970-01-01 00:00:00] [1970-01-01 08:00:00]
org.scalatest.exceptions.TestFailedException:
Results do not match for query:
Timezone:
sun.util.calendar.ZoneInfo[id="GMT-08:00",offset=-28800000,dstSavings=0,useDaylight=false,transitions=0,lastRule=null
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]