Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/19250 What's the interoperability issue with Impala? I think both Spark and Impala store timestamp as parquet INT96, representing nanoseconds from epoch, there is no timezone confusion. Internally Spark uses a long to store timestamp, representing microseconds from epoch, so we don't and shoud't consider timezone when reading parquet INT96 timestamp. I think your problem may about display. When Spark displays a timestamp value, via `df.show`, we convert the internal long value to standard timestamp string according to the session local timezone. Some examples: ``` // 1000 milliseconds from epoch, no timezone confusion scala> val df = Seq(new java.sql.Timestamp(1000)).toDF("ts") df: org.apache.spark.sql.DataFrame = [ts: timestamp] scala> spark.conf.set("spark.sql.session.timeZone", "GMT") scala> df.show +-------------------+ | ts| +-------------------+ |1970-01-01 00:00:01| +-------------------+ scala> spark.conf.set("spark.sql.session.timeZone", "PST") scala> df.show +-------------------+ | ts| +-------------------+ |1969-12-31 16:00:01| +-------------------+ ``` This behavior, I think makes sense, but may not be SQL-compliant. A clean solution is to add `TIMESTAMP WITE TIMEZONE` type, so that when we convert the internal long value to string, we can know which timezone to use. Your proposal seems to hack the internal long value and lie to Spark about the microseconds from eppch, which doesn't look good.
--- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org