Github user squito commented on the issue:
https://github.com/apache/spark/pull/19250
@cloud-fan I think you misunderstand the purpose of this change.
The primary purpose is actually to deal with parquet, where that option
doesn't do anything. We need this for parquet for two reasons:
1) **Interoperability with Impala**. Impala first used an int96 to store a
timestamp in parquet, and it always stored the time as UTC (to go with the SQL
standard definition of _timezone_). But spark (and hive) read it back in the
current timezone. Even when you don't change timezones, and the _timestamp
with time zone_ vs. _timestamp without time zone_ distinction doesn't matter,
you get different values before this change.
2) **SQL STANDARD TIMESTAMP**. SQL defines _timestamp_ to be a synonym for
_timestamp without time zone_. The behavior of that type is defined so if you
insert "08:30" with time zone "America/New_York", then load the data with time
zone "America/Los_Angeles", you should still see "08:30". Since parquet is
stored as an instant-in-time, and spark internally applies a timezone, the
change in timezone must be reversed, by using some consistent adustment when
saving and reloading. This doesn't give you real _timestamp without time
zone_, but gets you closer.
To be honest, I see limited value in this change for formats other than
parquet -- I added only because I thought Reynold wanted it (for symmetry
across formats, I suppose?). As the purpose of this is to *undo* timezones,
you can already achieve something similar in text-based formats by specifying a
format which leaves out the timezone. But it doesn't hurt.
We could reuse "timezone" option for parquet for this purpose, but that
would be rather strange as its almost doing the opposite as what that property
does for text-based formats, as that property is for adding a timezone, and
this is for "removing" it. Its doing something special enough it seems like it
deserves a more specific name than just "timezone".
(This is all discussed at greater length, including showing how this type
behaves in other sql engines, and how spark's behavior is non-standard, and how
it changed in 2.0.1, in the design docs.)
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]