cloud-fan commented on a change in pull request #27710: [SPARK-30960][SQL] add
back the legacy date/timestamp format support in CSV/JSON parser
URL: https://github.com/apache/spark/pull/27710#discussion_r384679403
##########
File path:
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/csv/UnivocityParser.scala
##########
@@ -175,10 +175,30 @@ class UnivocityParser(
}
case _: TimestampType => (d: String) =>
- nullSafeDatum(d, name, nullable, options)(timestampFormatter.parse)
+ nullSafeDatum(d, name, nullable, options) { datum =>
+ try {
+ timestampFormatter.parse(datum)
+ } catch {
+ case NonFatal(e) =>
+ // If fails to parse, then tries the way used in 2.0 and 1.x for
backwards
+ // compatibility.
+ val str = UTF8String.fromString(datum)
+ DateTimeUtils.stringToTimestamp(str,
options.zoneId).getOrElse(throw e)
Review comment:
We switch the calendar from "hybrid calendar (Julian + Gregorian)" to "ISO
chronology (Proleptic Gregorian)", which affects date/timestamp values before
October 15, 1582. For example, some dates are invalid according to the new
calendar but the old calendar accepts it. When we add days to a date, the
calendar skips invalid dates.
The new calendar is widely used by RDBMS and others like parquet, snowflake,
but we should have a better rollout plan for the calendar switch, like making
the calendar configurable for Spark. Unfortunately, we catch this too late.
However, the more serious problem is the switch of the pattern string(as we
switch the JDK datetime API internally), and 3.0 can't parse some
date/timestamp strings suddenly. This is the first step to fix this issue by
adding back the legacy fallback, we are still working on fixing the 1% when
users set a custom pattern string:
1. try to support the previous pattern string with the new calendar. If
can't, go 2)
2. detect the conflicts between old and new pattern string and throw an
exception to tell users what to do (turn on the legacy config or accept the new
pattern string knowing what it means)
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]