HyukjinKwon commented on a change in pull request #25708: [SPARK-28141][SQL]
Support special date values
URL: https://github.com/apache/spark/pull/25708#discussion_r321988260
##########
File path:
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/DateFormatter.scala
##########
@@ -17,24 +17,30 @@
package org.apache.spark.sql.catalyst.util
-import java.time.LocalDate
+import java.time.{LocalDate, ZoneId}
import java.util.Locale
+import DateTimeUtils._
+
sealed trait DateFormatter extends Serializable {
def parse(s: String): Int // returns days since epoch
def format(days: Int): String
}
class Iso8601DateFormatter(
pattern: String,
+ zoneId: ZoneId,
locale: Locale) extends DateFormatter with DateTimeFormatterHelper {
@transient
private lazy val formatter = getOrCreateFormatter(pattern, locale)
override def parse(s: String): Int = {
- val localDate = LocalDate.parse(s, formatter)
- DateTimeUtils.localDateToDays(localDate)
+ val specialDate = convertSpecialDate(s.trim, zoneId)
Review comment:
One thing I am a bit worried though, if users set the date `pattern`
including the leading or trailing white spaces intentionally (because they know
there's spaces), it will now returns invalid results.
```scala
DateFormatter(" yyyy MMM").parse("2018 Dec")
```
```
Text '2018 Dec' could not be parsed at index 0
java.time.format.DateTimeParseException: Text '2018 Dec' could not be parsed
at index 0
at java.base/java.time.format.DateTimeFormatter.parseResolved0(Unknown
Source)
at java.base/java.time.format.DateTimeFormatter.parse(Unknown Source)
at java.base/java.time.LocalDate.parse(Unknown Source)
at
org.apache.spark.sql.catalyst.util.Iso8601DateFormatter.parse(DateFormatter.scala:36)
at
org.apache.spark.sql.util.DateFormatterSuite.$anonfun$new$15(DateFormatterSuite.scala:94)
at
scala.runtime.java8.JFunction0$mcI$sp.apply(JFunction0$mcI$sp.java:23)
```
While trimming makes sense in general, there's difference comparing to
`stringToDate` since here we allow to set the `pattern` (and with this Java 8
APIs, now Spark can support exact and better match with the pattern).
Is this change necessary or does this change make it easier to fix the
issue?
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]