bart-samwel commented on a change in pull request #28576:
URL: https://github.com/apache/spark/pull/28576#discussion_r427836822



##########
File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/DateTimeFormatterHelper.scala
##########
@@ -31,17 +31,50 @@ import org.apache.spark.sql.internal.SQLConf
 import org.apache.spark.sql.internal.SQLConf.LegacyBehaviorPolicy._
 
 trait DateTimeFormatterHelper {
+  private def getOrDefault(accessor: TemporalAccessor, field: ChronoField, 
default: Int): Int = {
+    if (accessor.isSupported(field)) {
+      accessor.get(field)
+    } else {
+      default
+    }
+  }
+
+  protected def toLocalDate(accessor: TemporalAccessor, allowMissingYear: 
Boolean): LocalDate = {
+    val year = if (accessor.isSupported(ChronoField.YEAR)) {
+      accessor.get(ChronoField.YEAR)
+    } else if (allowMissingYear) {
+      // To keep backward compatibility with Spark 2.x, we pick 1970 as the 
default value of year.
+      1970
+    } else {
+      throw new SparkUpgradeException("3.0",
+        "Year must be given in the date/timestamp string to be parsed. You can 
set " +
+          SQLConf.LEGACY_ALLOW_MISSING_YEAR_DURING_PARSING.key + " to true, to 
pick 1970 as " +
+          "the default value of year.", null)
+    }
+    val month = getOrDefault(accessor, ChronoField.MONTH_OF_YEAR, 1)

Review comment:
       I'm saying it works (i.e., it accepts all valid values) but it's 
nonsensical; it'll never return the complete timestamp that people intended.
   
   That said, I can see cases where people want to parse contiguous ranges 
(e.g. just a time, or just a month, or just a month + day) and then extract the 
parsed components using `EXTRACT` / `date_part`, or just to format them 
differently.
   
   So I guess the most permissive but still safe thing here would be to forbid 
missing "year" only if someone also requests "day" and "month", because that's 
the situation where "year" affects the valid range of "day". But if they don't 
request year *and* they don't request month, then month would default to 1 and 
there would be no impact on the valid range of days, it would always go up to 
32. I.e.:
   
   `"yyyy MM dd"` -> OK
   `"MM dd"` -> fail, need year to know valid range of `dd`
   `"MM"` -> OK
   `"dd"` -> OK
   `"hh:mm:ss"` -> OK
   `"MM dd hh:mm:ss"` -> fail, need year to know valid range of `dd`




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to