cloud-fan commented on issue #26134: [SPARK-29486][SQL] CalendarInterval should have 3 fields: months, days and microseconds URL: https://github.com/apache/spark/pull/26134#issuecomment-543273473 > In Spark, timestamp is instant + session zone id I don't fully agree. Timestamp in Spark is instant. When Spark parses string to timestamp(e.g. `TIMESTAMP '2019-1-1 12:12:12'`), it assumes the input string is local datetime(unless it carries timezone) and parses it using the session local timezone id. That's why Spark stores timestamp as microseconds from UTC epoch: because it's an instant. The session local timezone id is also used when Spark needs to know the datetime fields of the instant, e.g. to string, extracting field functions, adding interval, etc. That said, a lot of datetime operations in Spark rely on the session local timezone id. Now we have 2 choices: 1. keep session local timezone id as it is. This raises the special requirement to have a day field in interval. 2. change session local timezone id to zone offset. This makes some datetime functions hard to use for some use cases. TIMESTAMP WITH LOCAL TIMEZONE is not a standard SQL type, so we are on our own now. To be honest I'd like to make Spark follow SQL standard and use TIMESTAMP WITHOUT TIMEZONE, but it changes the semantic and we don't know how to deal with existing timestamp data. I'm really on the fence. @MaxGekk do you have some better ideas?
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] With regards, Apache Git Services --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
