gaogaotiantian commented on PR #52980:
URL: https://github.com/apache/spark/pull/52980#issuecomment-3530875432

   > That being said, we should never rely on the local machine timezone. We 
should either respect the session timezone (specified by 
spark.sql.session.timeZone and it has a default value if not set), or the 
python objects should be timezone agnostic.
   
   I totally agree with this - that's the point I'm trying to make. Local 
machine timezone should never affect the result of user code.
   
   Let's talk about timestamps. It's discouraged in Python to have a real 
datetime object without timezone - because for any operation, it would be 
treated as with local time zone. I believe the actual internal storage uses an 
integer timestamp. When Python tries to convert an integer timestamp to a 
datetime by `datetime.datetime.fromtimestamp`, it assumes the integer to be a 
POSIX timestamp, so it needs a timezone to convert to a datetime. There's no 
real "timezone agnostic datetime" in Python - Python will assume a datetime 
without a timezone is local machine timezone.
   
   That being said, I think using UTC for TimestampNTZ is the correct 
implementation because Python will treat the integer timestamp as UTC, that 
should give the correct result.
   
   However, for timestamp with timezone, we should use either session config, 
or at least a consistent value for all executors (driver timezone would be a 
good candidate, UTC is another option).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to