[GitHub] [spark] MaxGekk commented on issue #26092: [SPARK-29440][SQL] Support java.time.Duration as an external type of CalendarIntervalType

GitBox Fri, 11 Oct 2019 19:50:27 -0700

MaxGekk commented on issue #26092: [SPARK-29440][SQL] Support
java.time.Duration as an external type of CalendarIntervalType
URL: https://github.com/apache/spark/pull/26092#issuecomment-541276742

> ... should be determined by the current date.

@hvanhovell Strictly speaking, it depends not only on the current date but
on the current time zone as well. Final result of `<local date-time> + interval
1 month 10 hour` can be different in your and mine time zones because if we can
have different Daylight Saving Time Rules. So, if you and me collect intervals
and add them to some dates, we could have different results, right.

Also, the result depends on the order of applying interval components.
Adding interval's `months` and `microseconds` to a date, and interval's
`microseconds` and `months` after that, this could produce different dates. For
now, the `CalendarInterval` class does not define the order.

> ... the assumption that a month has a fixed number of seconds. This is
just not true ...

Maybe this is not true but such assumption has been already made by Spark in
some places, for example:

https://github.com/apache/spark/blob/18b7ad2fc5fd2b215c10e0af086a025b3a1bd0fb/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/GroupStateImpl.scala#L161-L168

https://github.com/apache/spark/blob/18b7ad2fc5fd2b215c10e0af086a025b3a1bd0fb/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/EventTimeWatermark.scala#L29-L31

https://github.com/apache/spark/blob/18b7ad2fc5fd2b215c10e0af086a025b3a1bd0fb/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/DateTimeUtils.scala#L631
and other DBMS like PostgreSQL have predefined constant month duration:

https://github.com/postgres/postgres/blob/97c39498e5ca9208d3de5a443a2282923619bf91/src/include/datatype/timestamp.h#L77
which is used in calculation of interval length:
https://github.com/postgres/postgres/blob/bffe1bd68457e43925c362d8728ce3b25bdf1c94/src/backend/utils/adt/timestamp.c#L5016-L5022

I just want to say that such assumption can be made as other DBMS do.

> If you really want to go down this path than we need to make sure an
interval can only be month or micro second based.

This is what I have been trying to say in many PRs like #25022 and
https://github.com/apache/spark/pull/25981#discussion_r332571726

One more thing, where interval in Spark come from? They can be specified in
sql queries (via interval literals by an user but cannot be loaded from
datasources because interval type is not supported now) or appear as result of
`timestamp1 - timestamp2`. The last one (timestamp subtraction) set `months` to
zero always:
https://github.com/apache/spark/blob/eecef75350efda3efa7c5acb3b5eaa0d2fc8a552/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/datetimeExpressions.scala#L2111
. So, `months` can be non-zero only if it is specified by an user in a sql
query. Your objection is related to only this use case. If the user has strong
concerns regarding to average month duration, he/she could change interval
literal or we can fail his/her query if the `months` component is non-zero in
conversion to `Duration`. This is arguable but we could introduce special SQL
config for this and fail his/her query with clear error message, if you insist.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] MaxGekk commented on issue #26092: [SPARK-29440][SQL] Support java.time.Duration as an external type of CalendarIntervalType

Reply via email to