Maxim Gekk created SPARK-31662: ---------------------------------- Summary: Reading wrong dates from dictionary encoded columns in Parquet files Key: SPARK-31662 URL: https://issues.apache.org/jira/browse/SPARK-31662 Project: Spark Issue Type: Sub-task Components: SQL Affects Versions: 3.0.0, 3.1.0 Reporter: Maxim Gekk
Write dates with dictionary encoding enabled to parquet files: {code:scala} Welcome to ____ __ / __/__ ___ _____/ /__ _\ \/ _ \/ _ `/ __/ '_/ /___/ .__/\_,_/_/ /_/\_\ version 3.1.0-SNAPSHOT /_/ Using Scala version 2.12.10 (OpenJDK 64-Bit Server VM, Java 1.8.0_242) Type in expressions to have them evaluated. Type :help for more information. scala> spark.conf.set("spark.sql.legacy.parquet.rebaseDateTimeInWrite.enabled", true) scala> :paste // Entering paste mode (ctrl-D to finish) Seq.tabulate(8)(_ => "1001-01-01").toDF("dateS") .select($"dateS".cast("date").as("date")) .repartition(1) .write .option("parquet.enable.dictionary", true) .mode("overwrite") .parquet("/Users/maximgekk/tmp/parquet-date-dict") // Exiting paste mode, now interpreting. {code} Read them back: {code:scala} scala> spark.read.parquet("/Users/maximgekk/tmp/parquet-date-dict").show(false) +----------+ |date | +----------+ |1001-01-07| |1001-01-07| |1001-01-07| |1001-01-07| |1001-01-07| |1001-01-07| |1001-01-07| |1001-01-07| +----------+ {code} *Expected values must be 1000-01-01.* -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org