HyukjinKwon edited a comment on pull request #33875: URL: https://github.com/apache/spark/pull/33875#issuecomment-909834254
Just to clarify a bit more, Arrow specification describes as below (previously it was documented a local datetime) > If a Timestamp column has a non-empty timezone value, its epoch is 1970-01-01 00:00:00 (January 1st 1970, midnight) in an \*unknown\* timezone. > In particular, it is \*not\* possible to interpret an unset or empty timezone as the same as "UTC" which I believe is inspired from naive datetime vs aware datetime: https://docs.python.org/3/library/datetime.html#aware-and-naive-objects: > Because naive `datetime` objects are treated by many `datetime` methods as local times Here is Spark's take: - With `TimestampType`, we will interpret naive `datetime` as a local time (a.k.a. `TIMESTAMP WITH TIME ZONE`) - With `TimestampNTZType`, we will also interpret naive `datetime` as a time in an unknown timezone (a.k.a. `TIMESTAMP WITHOUT LOCAL TIME ZONE`), and computes them without caring the local (session) timezone -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
