GitHub user liancheng opened a pull request:

    https://github.com/apache/spark/pull/6617

    [SPARK-6777] [SQL] Implements backwards compatibility rules in 
CatalystSchemaConverter

    This PR introduces `CatalystSchemaConverter` for converting Parquet schema 
to Spark SQL schema and vice versa.  Original conversion code in 
`ParquetTypesConverter` is removed. Benefits of the new version are:
    
    1. When converting Spark SQL schemas, it generates standard Parquet schemas 
conforming to [the most updated Parquet format spec] [1].
    
       Note that although this version of Parquet format spec hasn't been 
officially release yet, Parquet MR 1.7.0 already sticks to it. So it should be 
safe to follow.
    
    1. It implements backwards-compatibility rules described in the most 
updated Parquet format spec. Thus can recognize more schema patterns generated 
by other/legacy systems/tools.
    1. Code organization follows convention used in [parquet-mr] [2], which is 
easier to follow. (Structure of `CatalystSchemaConverter` is similar to 
`AvroSchemaConverter`).
    
    To fully implement backwards-compatibility rules in both read and write 
path, we also need to update `CatalystRowConverter` (which is responsible for 
converting Parquet records to `Row`s), `RowReadSupport`, and `RowWriteSupport`. 
These would be done in follow-up PRs.
    
    TODO
    
    - [ ] More schema conversion test cases for legacy schema patterns.
    
    [1]: 
https://github.com/apache/parquet-format/blob/ea095226597fdbecd60c2419d96b54b2fdb4ae6c/LogicalTypes.md
    [2]: https://github.com/apache/parquet-mr/

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/liancheng/spark spark-6777

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/6617.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #6617
    
----
commit 28a6a542e26733de97b824faae69f81ba08ddfb9
Author: Cheng Lian <[email protected]>
Date:   2015-06-03T18:28:58Z

    Implements backwards compatibility rules in CatalystSchemaConverter

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to