GitHub user liancheng opened a pull request:

    https://github.com/apache/spark/pull/2563

    [SPARK-3713][SQL] Uses JSON to serialize DataType objects

    This PR uses JSON instead of `toString` to serialize `DataType`s. The 
latter is not only hard to parse but also flaky in many cases.
    
    Since we already write schema information to Parquet metadata in the old 
style, we have to reserve the old `DataType` parser and ensure downward 
compatibility. The old parser is now renamed to `CaseClassStringParser` and 
moved into `object DataType`.
    
    @JoshRosen @davis Please help review PySpark related changes, thanks!

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/liancheng/spark datatype-to-json

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/2563.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #2563
    
----
commit dca9153d213a9a9603d7b327d78750af66021ed2
Author: Cheng Lian <[email protected]>
Date:   2014-09-25T09:28:06Z

    De/serializes DataType objects from/to JSON

commit 5f792df158128f6bf41a49e816a915150698a9d2
Author: Cheng Lian <[email protected]>
Date:   2014-09-28T11:19:34Z

    Adds PySpark support

commit 26c6563ab1f7bc9c063da44ecfcb31dff65a3bf1
Author: Cheng Lian <[email protected]>
Date:   2014-09-28T11:54:26Z

    Adds compatibility est case for Parquet type conversion

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to