Maxim Gekk created SPARK-31211: ---------------------------------- Summary: Failure on loading 1000-02-29 from parquet saved by Spark 2.4.5 Key: SPARK-31211 URL: https://issues.apache.org/jira/browse/SPARK-31211 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 3.0.0 Reporter: Maxim Gekk
Save valid date in Julian calendar by Spark 2.4.5 in a leap year, for instance 1000-02-29: {code} $ export TZ="America/Los_Angeles" {code} {code:scala} scala> spark.conf.set("spark.sql.session.timeZone", "America/Los_Angeles") scala> df.write.mode("overwrite").format("avro").save("/Users/maxim/tmp/before_1582/2_4_5_date_avro_leap") scala> val df = Seq(java.sql.Date.valueOf("1000-02-29")).toDF("dateS").select($"dateS".as("date")) df: org.apache.spark.sql.DataFrame = [date: date] scala> df.show +----------+ | date| +----------+ |1000-02-29| +----------+ scala> df.write.mode("overwrite").parquet("/Users/maxim/tmp/before_1582/2_4_5_date_leap") {code} Load the parquet files back by Spark 3.1.0-SNAPSHOT: {code:scala} Welcome to ____ __ / __/__ ___ _____/ /__ _\ \/ _ \/ _ `/ __/ '_/ /___/ .__/\_,_/_/ /_/\_\ version 3.1.0-SNAPSHOT /_/ Using Scala version 2.12.10 (Java HotSpot(TM) 64-Bit Server VM, Java 1.8.0_231) Type in expressions to have them evaluated. Type :help for more information. scala> spark.read.parquet("/Users/maxim/tmp/before_1582/2_4_5_date_leap").show +----------+ | date| +----------+ |1000-03-06| +----------+ scala> spark.conf.set("spark.sql.legacy.parquet.rebaseDateTime.enabled", true) scala> spark.read.parquet("/Users/maxim/tmp/before_1582/2_4_5_date_leap").show 20/03/21 03:03:59 ERROR Executor: Exception in task 0.0 in stage 3.0 (TID 3) java.time.DateTimeException: Invalid date 'February 29' as '1000' is not a leap year at java.time.LocalDate.create(LocalDate.java:429) at java.time.LocalDate.of(LocalDate.java:269) at org.apache.spark.sql.catalyst.util.DateTimeUtils$.rebaseJulianToGregorianDays(DateTimeUtils.scala:1008) {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org