[GitHub] spark pull request: [SPARK-3713][SQL] Uses JSON to serialize DataT...

liancheng Sun, 28 Sep 2014 06:06:07 -0700

GitHub user liancheng opened a pull request:

    https://github.com/apache/spark/pull/2563


    [SPARK-3713][SQL] Uses JSON to serialize DataType objects

    This PR uses JSON instead of `toString` to serialize `DataType`s. The 
latter is not only hard to parse but also flaky in many cases.
    
    Since we already write schema information to Parquet metadata in the old 
style, we have to reserve the old `DataType` parser and ensure downward 
compatibility. The old parser is now renamed to `CaseClassStringParser` and 
moved into `object DataType`.
    
    @JoshRosen @davis Please help review PySpark related changes, thanks!

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/liancheng/spark datatype-to-json

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/2563.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #2563
    
----
commit dca9153d213a9a9603d7b327d78750af66021ed2
Author: Cheng Lian <lian.cs....@gmail.com>
Date:   2014-09-25T09:28:06Z

    De/serializes DataType objects from/to JSON

commit 5f792df158128f6bf41a49e816a915150698a9d2
Author: Cheng Lian <lian.cs....@gmail.com>
Date:   2014-09-28T11:19:34Z

    Adds PySpark support

commit 26c6563ab1f7bc9c063da44ecfcb31dff65a3bf1
Author: Cheng Lian <lian.cs....@gmail.com>
Date:   2014-09-28T11:54:26Z

    Adds compatibility est case for Parquet type conversion

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-3713][SQL] Uses JSON to serialize DataT...

Reply via email to