cloud-fan commented on PR #53161:
URL: https://github.com/apache/spark/pull/53161#issuecomment-3581415846

   @gaogaotiantian The key of Spark `TimestampType` is that it's an absolute 
time. The session timezone only matters when we render the timestamp without 
timezone (e.g. `df.show`, or cast to string, or functions that get 
year/month/.../second fields from timestamp).
   
   For the case of `df = spark.createDataFrame([(datetime.datetime(1990, 8, 10, 
0, 0),)], ["ts"])`, we use a specific session `spark` to create the dataframe, 
and apparently we should respect its session timezone. We should convert 
`datetime.datetime(1990, 8, 10, 0, 0)` to an absolute timestamp by attaching 
the session timezone to it. Moreover, we can have a mix of  python 
`datetime.datetime` objects which have different timezones or no timezone, and 
it's OK because we can still convert them to absolute timestamps.
   
   A similar example is reading JDBC table that contains column with standard 
TIMESTAMP WITH TIMEZONE type. Each value can have a different timezone but it's 
still OK to read it as Spark `TimestampType`, because they can be converted to 
absolute timestamps.
   
   Under the hood, `TimestampType` is stored as int64 in memory, which means 
number of microseconds from UTC epoch (`1970-01-01 00:00:00 Z`)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to