Maxim Gekk created SPARK-31211:
----------------------------------

             Summary: Failure on loading 1000-02-29 from parquet saved by Spark 
2.4.5
                 Key: SPARK-31211
                 URL: https://issues.apache.org/jira/browse/SPARK-31211
             Project: Spark
          Issue Type: Bug
          Components: SQL
    Affects Versions: 3.0.0
            Reporter: Maxim Gekk


Save valid date in Julian calendar by Spark 2.4.5 in a leap year, for instance 
1000-02-29:
{code}
$ export TZ="America/Los_Angeles"
{code}
{code:scala}
scala> spark.conf.set("spark.sql.session.timeZone", "America/Los_Angeles")
scala> 
df.write.mode("overwrite").format("avro").save("/Users/maxim/tmp/before_1582/2_4_5_date_avro_leap")

scala> val df = 
Seq(java.sql.Date.valueOf("1000-02-29")).toDF("dateS").select($"dateS".as("date"))
df: org.apache.spark.sql.DataFrame = [date: date]

scala> df.show
+----------+
|      date|
+----------+
|1000-02-29|
+----------+

scala> 
df.write.mode("overwrite").parquet("/Users/maxim/tmp/before_1582/2_4_5_date_leap")
{code}

Load the parquet files back by Spark 3.1.0-SNAPSHOT:
{code:scala}
Welcome to
      ____              __
     / __/__  ___ _____/ /__
    _\ \/ _ \/ _ `/ __/  '_/
   /___/ .__/\_,_/_/ /_/\_\   version 3.1.0-SNAPSHOT
      /_/

Using Scala version 2.12.10 (Java HotSpot(TM) 64-Bit Server VM, Java 1.8.0_231)
Type in expressions to have them evaluated.
Type :help for more information.

scala> spark.read.parquet("/Users/maxim/tmp/before_1582/2_4_5_date_leap").show
+----------+
|      date|
+----------+
|1000-03-06|
+----------+


scala> spark.conf.set("spark.sql.legacy.parquet.rebaseDateTime.enabled", true)

scala> spark.read.parquet("/Users/maxim/tmp/before_1582/2_4_5_date_leap").show
20/03/21 03:03:59 ERROR Executor: Exception in task 0.0 in stage 3.0 (TID 3)
java.time.DateTimeException: Invalid date 'February 29' as '1000' is not a leap 
year
        at java.time.LocalDate.create(LocalDate.java:429)
        at java.time.LocalDate.of(LocalDate.java:269)
        at 
org.apache.spark.sql.catalyst.util.DateTimeUtils$.rebaseJulianToGregorianDays(DateTimeUtils.scala:1008)
{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to