srowen commented on a change in pull request #25981: [SPARK-28420][SQL] Support
the `INTERVAL` type in `date_part()`
URL: https://github.com/apache/spark/pull/25981#discussion_r332209319
##########
File path:
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/datetimeExpressions.scala
##########
@@ -2067,6 +2082,10 @@ object DatePart {
224
> SELECT _FUNC_('SECONDS', timestamp'2019-10-01 00:00:01.000001');
1.000001
+ > SELECT _FUNC_('days', interval 1 year 10 months 5 days);
Review comment:
Yeah I get why we split the months and ms part, and can see what the
convenient implementation is w.r.t. the current implementation. Maybe the
details of the semantics aren't that important for practical purposes.
But it does seem like this proposal doesn't match PostgreSQL in the `SELECT
date_part('hour', INTERVAL '4 hours 3 minutes');` case, at least. But it's
impractical to follow this behavior given the internal representation.
`SELECT date_part('hour', INTERVAL '1 month 1 day');` gives 24, right?
because the days are separable from the months.
`SELECT date_part('month', INTERVAL '1 year 1 month');` gives 13, not 1,
right? because the month part isn't separable.
This feels inconsistent. What if we construed the semantics to always mean
'the given interval in the given units'? that's consistent, but doesn't quite
sound like what `date_part` does, as it's no longer a 'part'.
Am I right about that so far?
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]