Github user cloud-fan commented on the issue:

    https://github.com/apache/spark/pull/19250
  
    What's the interoperability issue with Impala? I think both Spark and 
Impala store timestamp as parquet INT96, representing nanoseconds from epoch, 
there is no timezone confusion. Internally Spark uses a long to store 
timestamp, representing microseconds from epoch, so we don't and shoud't 
consider timezone when reading parquet INT96 timestamp.
    
    I think your problem may about display. When Spark displays a timestamp 
value, via `df.show`, we convert the internal long value to standard timestamp 
string according to the session local timezone. Some examples:
    ```
    // 1000 milliseconds from epoch, no timezone confusion
    scala> val df = Seq(new java.sql.Timestamp(1000)).toDF("ts")
    df: org.apache.spark.sql.DataFrame = [ts: timestamp]
    
    scala> spark.conf.set("spark.sql.session.timeZone", "GMT")
    
    scala> df.show
    +-------------------+
    |                 ts|
    +-------------------+
    |1970-01-01 00:00:01|
    +-------------------+
    
    scala> spark.conf.set("spark.sql.session.timeZone", "PST")
    
    scala> df.show
    +-------------------+
    |                 ts|
    +-------------------+
    |1969-12-31 16:00:01|
    +-------------------+
    ```
    
    This behavior, I think makes sense, but may not be SQL-compliant. A clean 
solution is to add `TIMESTAMP WITE TIMEZONE` type, so that when we convert the 
internal long value to string, we can know which timezone to use.
    
    Your proposal seems to hack the internal long value and lie to Spark about 
the microseconds from eppch, which doesn't look good.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to