cloud-fan edited a comment on issue #25022: [SPARK-24695][SQL] Move `CalendarInterval` to org.apache.spark.sql.types package URL: https://github.com/apache/spark/pull/25022#issuecomment-533488679 IIUC the problem of #25678 is that, it's too much work to add a new data type. We should only do it with a strong use case to justify it. Interval type is an existing data type, but hasn't been completely exposed to end-users yet. It's not that much of work to make it right and fully expose it. In general, I'm in favor of Java's design. `Period` is a conceptual interval while `Duration` is a concrete interval. It's more powerful than the SQL standard year-month interval + day-time interval as it supports conceptual days. Spark should support all datetime operations (e.g. date + interval, timestamp + interval, etc.) w.r.t the session local timezone. UDF can do the same thing by getting the session local timezone from SQLConf, but I don't think that's a common use case as users should call Spark functions to do datetime operations. For example, `timestamp + interval` can be implemented by 1. convert the internal long value to `Instant` 2. convert `Instant` to `ZonedDateTime` 3. extract a `Period` (months and days) from `CalendarInterval` 4. add the `Period` to the `ZonedDateTime` 5. convert the updated `ZonedDateTime` back to `Instant` 6. extract `Duration` (seconds) from `CalendarInterval` 7. add the `Duration` to the `Instant` 8. convert the updated `Instant` back to a long value `CalendarInterval` should contain 3 ints for months, days and seconds. We can add some methods to `CalendarInterval` to extract `Duration` and `Period`, so that it's easier to be used in UDF. I think it's better to have a single interval type: 1. simplifies the type system 2. supports conceptual days 3. still compatible with SQL standard 4. compatible with parquet The only disadvantage is we can't sort intervals, but I don't think that matters.
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] With regards, Apache Git Services --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
