MaxGekk commented on a change in pull request #35502:
URL: https://github.com/apache/spark/pull/35502#discussion_r810064290
##########
File path:
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/DateTimeUtils.scala
##########
@@ -1163,4 +1163,40 @@ object DateTimeUtils {
val localStartTs = getLocalDateTime(startMicros, zoneId)
ChronoUnit.MICROS.between(localStartTs, localEndTs)
}
+
+ /**
+ * Adds the specified number of units to a timestamp.
+ *
+ * @param unit A keyword that specifies the interval units to add to the
input timestamp.
+ * @param quantity The amount of `unit`s to add. It can be positive or
negative.
+ * @param micros The input timestamp value, expressed in microseconds since
1970-01-01 00:00:00Z.
+ * @param zoneId The time zone ID at which the operation is performed.
+ * @return A timestamp value, expressed in microseconds since 1970-01-01
00:00:00Z.
+ */
+ def timestampAdd(unit: String, quantity: Int, micros: Long, zoneId: ZoneId):
Long = {
+ unit.toUpperCase(Locale.ROOT) match {
+ case "MICROSECOND" =>
+ timestampAddDayTime(micros, quantity, zoneId)
+ case "MILLISECOND" =>
+ timestampAddDayTime(micros, quantity * MICROS_PER_MILLIS, zoneId)
+ case "SECOND" =>
+ timestampAddDayTime(micros, quantity * MICROS_PER_SECOND, zoneId)
+ case "MINUTE" =>
+ timestampAddDayTime(micros, quantity * MICROS_PER_MINUTE, zoneId)
+ case "HOUR" =>
+ timestampAddDayTime(micros, quantity * MICROS_PER_HOUR, zoneId)
+ case "DAY" | "DAYOFYEAR" =>
+ timestampAddDayTime(micros, quantity * MICROS_PER_DAY, zoneId)
+ case "WEEK" =>
+ timestampAddDayTime(micros, quantity * MICROS_PER_DAY * DAYS_PER_WEEK,
zoneId)
+ case "MONTH" =>
+ timestampAddMonths(micros, quantity, zoneId)
+ case "QUARTER" =>
+ timestampAddMonths(micros, quantity * 3, zoneId)
+ case "YEAR" =>
+ timestampAddMonths(micros, quantity * MONTHS_PER_YEAR, zoneId)
+ case _ =>
+ throw QueryExecutionErrors.invalidUnitInTimestampAdd(unit)
Review comment:
> I don't understand why we suddenly want to stop doing it from this PR.
1. The `unit` param can be non-foldable. I made it generic intentionally. If
you wonder why, I will answer to that separately.
2. As `unit` can be non-foldable, we need the runtime check.
3. If we add checks in parser, we will do checks twice at parsing and at
execution... which is not necessary because
4. We can handle foldable `unit` in codegen as an optimization where we (of
course) have to check `unit` values at the optimization phase.
As summary, taking into account that we will optimize foldable `unit` in
codegen in the near future where we validate correctness of `unit`, there is no
need to do that in parser as you proposed.
> Example: EXTRACT, TO_BINARY, TO_NUMBER
The expressions require one of their param (format, field and etc) to be
**always** foldable. In the case, of `TIMESTAMPADD()` is unnecessary
restriction, I believe. I have faced to the situation a few times in my life
when some code was deployed in the production after testing, and need to
increase precision of intervals. Let's say we had:
```sql
select timestampadd(SECOND, tbl.quantity, tbl.ts1)
```
, and we wants to bump precision of `tbl.quantity` to milliseconds. Since
`quantity` is a column in a table, we can just multiply it by 1000 during a
maintenance time but we should do with `SECOND`? We have to re-deploy to code,
including pass whole release cycle, only because a Spark dev forced us to
hard-code the `SECOND` in our code, for some unclear reasons.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]