[GitHub] spark pull request: [SPARK-8848] [SQL] Refactors Parquet write pat...

liancheng Thu, 08 Oct 2015 14:10:02 -0700

Github user liancheng commented on the pull request:

    https://github.com/apache/spark/pull/8988#issuecomment-146685643
  
    A note about interoperability:
    
    Hive 1.2.1 can read Parquet arrays and maps written in standard format. 
However, it still doesn't recognize Parquet decimals stored as `INT32` or 
`INT64` ([HIVE-12069] [1]).  There are two options to workaround this issue:
    
    1.  Always turn on legacy mode if the written Parquet files are supposed to 
be used together with Hive.
    
        Legacy mode is turned off by default in this PR.
    
    2.  Add a separate SQL option `spark.sql.parquet.writeCompactDecimal` to 
indicate whether decimals can be written as `INT32` and `INT64`.
    
        This PR hasn't implemented this option yet.  If we prefer this 
approach, I can do it in another PR.  We probably want this option to be 
`false` by default.
    
    I'd vote for 2.
    
    @davies @marmbrus @rxin Thoughts?
    
    [1]: https://issues.apache.org/jira/browse/HIVE-12069




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] spark pull request: [SPARK-8848] [SQL] Refactors Parquet write pat...

Reply via email to