[ https://issues.apache.org/jira/browse/SPARK-31598?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Hyukjin Kwon reassigned SPARK-31598: ------------------------------------ Assignee: Bruce Robbins > LegacySimpleTimestampFormatter incorrectly interprets pre-Gregorian timestamps > ------------------------------------------------------------------------------ > > Key: SPARK-31598 > URL: https://issues.apache.org/jira/browse/SPARK-31598 > Project: Spark > Issue Type: Bug > Components: SQL > Affects Versions: 3.0.0, 3.1.0 > Reporter: Bruce Robbins > Assignee: Bruce Robbins > Priority: Major > Fix For: 3.0.0, 3.1.0 > > > As per discussion with [~maxgekk]: > {{LegacySimpleTimestampFormatter#parse}} misinterprets pre-Gregorian > timestamps: > {noformat} > scala> sql("set spark.sql.legacy.timeParserPolicy=LEGACY") > res0: org.apache.spark.sql.DataFrame = [key: string, value: string] > scala> val df1 = Seq("0002-01-01 00:00:00", "1000-01-01 00:00:00", > "1800-01-01 00:00:00").toDF("expected") > df1: org.apache.spark.sql.DataFrame = [expected: string] > scala> val df2 = df1.select('expected, to_timestamp('expected, "yyyy-MM-dd > HH:mm:ss").as("actual")) > df2: org.apache.spark.sql.DataFrame = [expected: string, actual: timestamp] > scala> df2.show(truncate=false) > +-------------------+-------------------+ > |expected |actual | > +-------------------+-------------------+ > |0002-01-01 00:00:00|0001-12-30 00:00:00| > |1000-01-01 00:00:00|1000-01-06 00:00:00| > |1800-01-01 00:00:00|1800-01-01 00:00:00| > +-------------------+-------------------+ > scala> > {noformat} > Legacy timestamp parsing with JSON and CSV files is correct, so apparently > {{LegacyFastTimestampFormatter}} does not have this issue (need to double > check). -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org