MaxGekk commented on a change in pull request #25981: [SPARK-28420][SQL]
Support the `INTERVAL` type in `date_part()`
URL: https://github.com/apache/spark/pull/25981#discussion_r332194347
##########
File path:
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/datetimeExpressions.scala
##########
@@ -2067,6 +2082,10 @@ object DatePart {
224
> SELECT _FUNC_('SECONDS', timestamp'2019-10-01 00:00:01.000001');
1.000001
+ > SELECT _FUNC_('days', interval 1 year 10 months 5 days);
Review comment:
> So, feel free to tell me to go look up a reference, but I am sort of
confused about the semantics.
Semantic is defined by internal representation of Catalyst's
`CalendarIntervalType`. Purpose of the type is to represent components of an
interval not duration of interval (see the discussions in
https://github.com/apache/spark/pull/25022). `CalendarIntervalType` stores the
`year`, `months`, `weeks`, `days`, `hours`, `minutes`, `seconds`,
`milliseconds`, `microseconds` components in 2 independent variables -
`months` and `microseconds`. Due to independence and unlimited (limited by
long/int size) of the `microseconds` field, we can store more than 31 days in
`microseconds`.
Because purpose of `CalendarIntervalType` is to represent interval as
components but not duration, the same duration in microseconds can have
multiple representations in Catalyst `INTERVAL` type. Let's say:
390000000 microseconds = `interval 5 minutes 90 seconds` = `interval 6
minutes 30 seconds` = `interval 1 minute 330 seconds` and so on.
When an user request components of an interval, we need to select one of its
representation. We do that by constructing "normalized" representation in where
number of `years` and `days` is unlimited (limited by int or long) and `months`
should be in `[0..12)`, `hours` is in `[0, 12)`, `minutes` is `[0, 60)` ...
`microseconds` is in `[0, 1000000)`. `seconds` and `milliseconds` are special
cases because they have the fractional part as well but they limited too.
`seconds` - `[0, 59.999999]`, milliseconds - `[0, 59999.999]`. This behavior is
inherited from PostgreSQL. In this way, there is only one "normalized"
representation of any interval from which the `date_part()` function extracts
components.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]