cloud-fan commented on issue #26134: [SPARK-29486][SQL] CalendarInterval should 
have 3 fields: months, days and microseconds
URL: https://github.com/apache/spark/pull/26134#issuecomment-543273473
 
 
   > In Spark, timestamp is instant + session zone id
   
   I don't fully agree. Timestamp in Spark is instant. When Spark parses string 
to timestamp(e.g. `TIMESTAMP '2019-1-1 12:12:12'`), it assumes the input string 
is local datetime(unless it carries timezone) and parses it using the session 
local timezone id. That's why Spark stores timestamp as microseconds from UTC 
epoch: because it's an instant.
   
   The session local timezone id is also used when Spark needs to know the 
datetime fields of the instant, e.g. to string, extracting field functions, 
adding interval, etc. That said, a lot of datetime operations in Spark rely on 
the session local timezone id.
   
   Now we have 2 choices:
   1. keep session local timezone id as it is. This raises the special 
requirement to have a day field in interval.
   2. change session local timezone id to zone offset. This makes some datetime 
functions hard to use for some use cases.
   
   TIMESTAMP WITH LOCAL TIMEZONE is not a standard SQL type, so we are on our 
own now. To be honest I'd like to make Spark follow SQL standard and use 
TIMESTAMP WITHOUT TIMEZONE, but it changes the semantic and we don't know how 
to deal with existing timestamp data.
   
   I'm really on the fence. @MaxGekk do you have some better ideas?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to