MaxGekk commented on issue #26092: [SPARK-29440][SQL] Support 
java.time.Duration as an external type of CalendarIntervalType
URL: https://github.com/apache/spark/pull/26092#issuecomment-541276742
 
 
   > ... should be determined by the current date.
   
   @hvanhovell Strictly speaking, it depends not only on the current date but 
on the current time zone as well. Final result of `<local date-time> + interval 
1 month 10 hour` can be different in your and mine time zones because if we can 
have different Daylight Saving Time Rules. So, if you and me collect intervals 
and add them to some dates, we could have different results, right.
   
   Also, the result depends on the order of applying interval components. 
Adding interval's `months` and `microseconds` to a date, and interval's 
`microseconds` and `months` after that, this could produce different dates. For 
now, the `CalendarInterval` class does not define the order.
   
   > ... the assumption that a month has a fixed number of seconds. This is 
just not true ...
   
   Maybe this is not true but such assumption has been already made by Spark in 
some places, for example:
   
https://github.com/apache/spark/blob/18b7ad2fc5fd2b215c10e0af086a025b3a1bd0fb/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/GroupStateImpl.scala#L161-L168
   
https://github.com/apache/spark/blob/18b7ad2fc5fd2b215c10e0af086a025b3a1bd0fb/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/EventTimeWatermark.scala#L29-L31
   
https://github.com/apache/spark/blob/18b7ad2fc5fd2b215c10e0af086a025b3a1bd0fb/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/DateTimeUtils.scala#L631
   and other DBMS like PostgreSQL have predefined constant month duration:
   
https://github.com/postgres/postgres/blob/97c39498e5ca9208d3de5a443a2282923619bf91/src/include/datatype/timestamp.h#L77
 which is used in calculation of interval length: 
https://github.com/postgres/postgres/blob/bffe1bd68457e43925c362d8728ce3b25bdf1c94/src/backend/utils/adt/timestamp.c#L5016-L5022
   
   I just want to say that such assumption can be made as other DBMS do.
   
   > If you really want to go down this path than we need to make sure an 
interval can only be month or micro second based.
   
   This is what I have been trying to say in many PRs like #25022 and 
https://github.com/apache/spark/pull/25981#discussion_r332571726
   
   One more thing, where interval in Spark come from? They can be specified in 
sql queries (via interval literals by an user but cannot be loaded from 
datasources because interval type is not supported now) or appear as result of 
`timestamp1 - timestamp2`. The last one (timestamp subtraction) set `months` to 
zero always: 
https://github.com/apache/spark/blob/eecef75350efda3efa7c5acb3b5eaa0d2fc8a552/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/datetimeExpressions.scala#L2111
 . So, `months` can be non-zero only if it is specified by an user in a sql 
query. Your objection is related to only this use case. If the user has strong 
concerns regarding to average month duration, he/she could change interval 
literal or we can fail his/her query if the `months` component is non-zero in 
conversion to `Duration`. This is arguable but we could introduce special SQL 
config for this and fail his/her query with clear error message, if you insist.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to