Github user cloud-fan commented on the issue:
https://github.com/apache/spark/pull/19250
Ah now I understand this issue. Yes Spark doesn't follow the SQL standard,
the Spark timestamp is actually TIMESTAMP WITH LOCAL TIME ZONE, which is not
SQL standard but used in some databases like
[Oracle](https://docs.oracle.com/cd/B19306_01/server.102/b14200/functions079.htm).
Although Impala follows SQL standard, it doesn't follow parquet standard,
that's why we need to deal with the parquet INT96 issue here. I think we can
follow what Hive/Impala did for interoperability, i.e. create a config to
interpret parquet INT96 as timezone-agnostic timestamp in parquet reader.
However, I'm less sure about the `parquet.timezone-adjustment` table
property. Is this a standard published somewhere? Do Impala and Hive both
respect it? I think we need people from both Impapa and Hive to say YES to this
proposal.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]