[GitHub] spark pull request #19769: [SPARK-12297][SQL] Adjust timezone for int96 data...

squito Thu, 16 Nov 2017 13:12:07 -0800

GitHub user squito opened a pull request:

    https://github.com/apache/spark/pull/19769


    [SPARK-12297][SQL] Adjust timezone for int96 data from impala

    ## What changes were proposed in this pull request?
    
    Int96 data written by impala vs data written by hive & spark is stored 
slightly differently -- they use a different offset for the timezone.  This 
adds an option "spark.sql.parquet.int96TimestampConversion" (false by default) 
to adjust timestamps if and only if the writer is impala (or more precisely, if 
the parquet file's "createdBy" metadata does not start with "parquet-mr").  
This matches the existing behavior in hive from HIVE-9482. 
    
    ## How was this patch tested?
    
    Unit test added, existing tests run via jenkins.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/squito/spark SPARK-12297_skip_conversion

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/19769.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #19769
    
----
commit e36453c2118fa3c45f424536ea787a95f0328687
Author: Imran Rashid <[email protected]>
Date:   2017-11-15T18:44:10Z

    wip

commit 2c71453cead0025deaeadddbbad1c3a79e49a99f
Author: Imran Rashid <[email protected]>
Date:   2017-11-15T22:48:18Z

    test and a bunch of plumbing in place

commit 592663393686d31a1d759e0e69540c10ee99dc69
Author: Imran Rashid <[email protected]>
Date:   2017-11-16T18:38:10Z

    works, needs cleanup

commit e9ecd16defd82e9a33fc31fa200f961d2dad9d2e
Author: Imran Rashid <[email protected]>
Date:   2017-11-16T19:35:24Z

    cleanup, test for predicate pushdown

----


---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] spark pull request #19769: [SPARK-12297][SQL] Adjust timezone for int96 data...

Reply via email to