[GitHub] [spark] MaxGekk commented on a change in pull request #35502: [SPARK-38195][SQL] Add the `TIMESTAMPADD()` function

GitBox Fri, 18 Feb 2022 06:47:12 -0800


MaxGekk commented on a change in pull request #35502:
URL: https://github.com/apache/spark/pull/35502#discussion_r810064290




##########
File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/DateTimeUtils.scala
##########
@@ -1163,4 +1163,40 @@ object DateTimeUtils {
     val localStartTs = getLocalDateTime(startMicros, zoneId)
     ChronoUnit.MICROS.between(localStartTs, localEndTs)
   }
+
+  /**
+   * Adds the specified number of units to a timestamp.
+   *
+   * @param unit A keyword that specifies the interval units to add to the 
input timestamp.
+   * @param quantity The amount of `unit`s to add. It can be positive or 
negative.
+   * @param micros The input timestamp value, expressed in microseconds since 
1970-01-01 00:00:00Z.
+   * @param zoneId The time zone ID at which the operation is performed.
+   * @return A timestamp value, expressed in microseconds since 1970-01-01 
00:00:00Z.
+   */
+  def timestampAdd(unit: String, quantity: Int, micros: Long, zoneId: ZoneId): 
Long = {
+    unit.toUpperCase(Locale.ROOT) match {
+      case "MICROSECOND" =>
+        timestampAddDayTime(micros, quantity, zoneId)
+      case "MILLISECOND" =>
+        timestampAddDayTime(micros, quantity * MICROS_PER_MILLIS, zoneId)
+      case "SECOND" =>
+        timestampAddDayTime(micros, quantity * MICROS_PER_SECOND, zoneId)
+      case "MINUTE" =>
+        timestampAddDayTime(micros, quantity * MICROS_PER_MINUTE, zoneId)
+      case "HOUR" =>
+        timestampAddDayTime(micros, quantity * MICROS_PER_HOUR, zoneId)
+      case "DAY" | "DAYOFYEAR" =>
+        timestampAddDayTime(micros, quantity * MICROS_PER_DAY, zoneId)
+      case "WEEK" =>
+        timestampAddDayTime(micros, quantity * MICROS_PER_DAY * DAYS_PER_WEEK, 
zoneId)
+      case "MONTH" =>
+        timestampAddMonths(micros, quantity, zoneId)
+      case "QUARTER" =>
+        timestampAddMonths(micros, quantity * 3, zoneId)
+      case "YEAR" =>
+        timestampAddMonths(micros, quantity * MONTHS_PER_YEAR, zoneId)
+      case _ =>
+        throw QueryExecutionErrors.invalidUnitInTimestampAdd(unit)

Review comment:
       > I don't understand why we suddenly want to stop doing it from this PR.
   
   1. The `unit` param can be non-foldable. I made it generic intentionally. If 
you wonder why, I will answer to that separately.
   2. As `unit` can be non-foldable, we need the runtime check.
   3. If we add checks in parser, we will do checks twice at parsing and at 
execution... which is not necessary because
   4. We can handle foldable `unit` in codegen as an optimization where we (of 
course) have to check `unit` values at the optimization phase.
   
   As summary, taking into account that we will optimize foldable `unit` in 
codegen in the near future where we validate correctness of `unit`, there is no 
need to do that in parser as you proposed.
   
   > Example: EXTRACT, TO_BINARY, TO_NUMBER
   
   The expressions require one of their param (format, field and etc) to be 
**always** foldable. In the case, of `TIMESTAMPADD()` is unnecessary 
restriction, I believe. I have faced to the situation a few times in my life 
when some code was deployed in the production after testing, and need to 
increase precision of intervals. Let's say we had:
   ```sql
   select timestampadd(SECOND, tbl.quantity, tbl.ts1) 
   ```
   , and we wants to bump precision of `tbl.quantity` to milliseconds. Since 
`quantity` is a column in a table, we can just multiply it by 1000 during a 
maintenance time but we should do with `SECOND`?  We have to re-deploy to code, 
including pass whole release cycle, only because a Spark dev forced us to 
hard-code the `SECOND` in our code, for some unclear reasons.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [spark] MaxGekk commented on a change in pull request #35502: [SPARK-38195][SQL] Add the `TIMESTAMPADD()` function

Reply via email to