cloud-fan edited a comment on issue #25022: [SPARK-24695][SQL] Move 
`CalendarInterval` to org.apache.spark.sql.types package
URL: https://github.com/apache/spark/pull/25022#issuecomment-533488679
 
 
   IIUC the problem of #25678 is that, it's too much work to add a new data 
type. We should only do it with a strong use case to justify it.
   
   Interval type is an existing data type, but hasn't been completely exposed 
to end-users yet. It's not that much of work to make it right and fully expose 
it.
   
   In general, I'm in favor of Java's design. `Period` is a conceptual interval 
while `Duration` is a concrete interval. It's more powerful than the SQL 
standard year-month interval + day-time interval as it supports conceptual days.
   
   Spark should support all datetime operations (e.g. date + interval, 
timestamp + interval, etc.) w.r.t the session local timezone. UDF can do the 
same thing by getting the session local timezone from SQLConf, but I don't 
think that's a common use case as users should call Spark functions to do 
datetime operations.
   
   For example, `timestamp + interval` can be implemented by
   1. convert the internal long value to `Instant`
   2. convert `Instant`  to `ZonedDateTime`
   3. extract a `Period` (months and days) from `CalendarInterval`
   4. add the `Period` to the `ZonedDateTime`
   5. convert the updated `ZonedDateTime` back to `Instant`
   6. extract `Duration` (seconds) from `CalendarInterval`
   7. add the `Duration` to the `Instant`
   8. convert the updated `Instant` back to a long value
   
   `CalendarInterval` should contain 3 ints for months, days and seconds. We 
can add some methods to `CalendarInterval` to extract `Duration` and `Period`, 
so that it's easier to be used in UDF.
   
   I think it's better to have a single interval type:
   1. simplifies the type system
   2. supports conceptual days
   3. still compatible with SQL standard
   4. compatible with parquet
   
   The only disadvantage is we can't sort intervals, but I don't think that 
matters.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to