MaxGekk commented on issue #26092: [SPARK-29440][SQL] Support java.time.Duration as an external type of CalendarIntervalType URL: https://github.com/apache/spark/pull/26092#issuecomment-541276742 > ... should be determined by the current date. @hvanhovell Strictly speaking, it depends not only on the current date but on the current time zone as well. Final result of `<local date-time> + interval 1 month 10 hour` can be different in your and mine time zones because if we can have different Daylight Saving Time Rules. So, if you and me collect intervals and add them to some dates, we could have different results, right. Also, the result depends on the order of applying interval components. Adding interval's `months` and `microseconds` to a date, and interval's `microseconds` and `months` after that, this could produce different dates. For now, the `CalendarInterval` class does not define the order. > ... the assumption that a month has a fixed number of seconds. This is just not true ... Maybe this is not true but such assumption has been already made by Spark in some places, for example: https://github.com/apache/spark/blob/18b7ad2fc5fd2b215c10e0af086a025b3a1bd0fb/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/GroupStateImpl.scala#L161-L168 https://github.com/apache/spark/blob/18b7ad2fc5fd2b215c10e0af086a025b3a1bd0fb/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/EventTimeWatermark.scala#L29-L31 https://github.com/apache/spark/blob/18b7ad2fc5fd2b215c10e0af086a025b3a1bd0fb/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/DateTimeUtils.scala#L631 and other DBMS like PostgreSQL have predefined constant month duration: https://github.com/postgres/postgres/blob/97c39498e5ca9208d3de5a443a2282923619bf91/src/include/datatype/timestamp.h#L77 which is used in calculation of interval length: https://github.com/postgres/postgres/blob/bffe1bd68457e43925c362d8728ce3b25bdf1c94/src/backend/utils/adt/timestamp.c#L5016-L5022 I just want to say that such assumption can be made as other DBMS do. > If you really want to go down this path than we need to make sure an interval can only be month or micro second based. This is what I have been trying to say in many PRs like #25022 and https://github.com/apache/spark/pull/25981#discussion_r332571726 One more thing, where interval in Spark come from? They can be specified in sql queries (via interval literals by an user but cannot be loaded from datasources because interval type is not supported now) or appear as result of `timestamp1 - timestamp2`. The last one (timestamp subtraction) set `months` to zero always: https://github.com/apache/spark/blob/eecef75350efda3efa7c5acb3b5eaa0d2fc8a552/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/datetimeExpressions.scala#L2111 . So, `months` can be non-zero only if it is specified by an user in a sql query. Your objection is related to only this use case. If the user has strong concerns regarding to average month duration, he/she could change interval literal or we can fail his/her query if the `months` component is non-zero in conversion to `Duration`. This is arguable but we could introduce special SQL config for this and fail his/her query with clear error message, if you insist.
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org