Tim Armstrong created IMPALA-10491:
--------------------------------------

             Summary: Impala parquet scanner should use writer.time.zone when 
converting Hive timestamps
                 Key: IMPALA-10491
                 URL: https://issues.apache.org/jira/browse/IMPALA-10491
             Project: IMPALA
          Issue Type: Improvement
          Components: Backend
    Affects Versions: Impala 3.4.0
            Reporter: Tim Armstrong


IMPALA-8721 reports some issues with Hive 3 and timezone conversion.

HIVE-21290 fixed some of the issues, and also sets writer.time.zone in the 
Parquet metadata, which provides a better way to determine how the time zone 
was written. E.g.

{noformat}
tarmstrong@tarmstrong-Precision-7540:~/impala/impala$ hadoop jar 
~/repos/parquet-mr/parquet-tools/target/parquet-tools-1.12.0-SNAPSHOT.jar meta 
/test-warehouse/asdfgh/000000_0
21/02/08 20:26:44 INFO hadoop.ParquetFileReader: Initiating action with 
parallelism: 5
21/02/08 20:26:44 INFO hadoop.ParquetFileReader: reading another 1 footers
21/02/08 20:26:44 INFO hadoop.ParquetFileReader: Initiating action with 
parallelism: 5
file:        hdfs://localhost:20500/test-warehouse/asdfgh/000000_0
creator:     parquet-mr version 1.10.99.7.2.7.0-44 (build 
27344fd5fdaa371e364c604f471b340f8bcf8936)
extra:       writer.date.proleptic = false
extra:       writer.time.zone = America/Los_Angeles
extra:       writer.model.name = 3.1.3000.7.2.7.0-44
{noformat}

We should use this timezone when converting timestamps, I think either always 
or when convert_legacy_hive_parquet_utc_timestamps=true. 

CC [~boroknagyz] [~csringhofer]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

Reply via email to