GitHub user liancheng opened a pull request:
https://github.com/apache/spark/pull/2563
[SPARK-3713][SQL] Uses JSON to serialize DataType objects
This PR uses JSON instead of `toString` to serialize `DataType`s. The
latter is not only hard to parse but also flaky in many cases.
Since we already write schema information to Parquet metadata in the old
style, we have to reserve the old `DataType` parser and ensure downward
compatibility. The old parser is now renamed to `CaseClassStringParser` and
moved into `object DataType`.
@JoshRosen @davis Please help review PySpark related changes, thanks!
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/liancheng/spark datatype-to-json
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/spark/pull/2563.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #2563
----
commit dca9153d213a9a9603d7b327d78750af66021ed2
Author: Cheng Lian <[email protected]>
Date: 2014-09-25T09:28:06Z
De/serializes DataType objects from/to JSON
commit 5f792df158128f6bf41a49e816a915150698a9d2
Author: Cheng Lian <[email protected]>
Date: 2014-09-28T11:19:34Z
Adds PySpark support
commit 26c6563ab1f7bc9c063da44ecfcb31dff65a3bf1
Author: Cheng Lian <[email protected]>
Date: 2014-09-28T11:54:26Z
Adds compatibility est case for Parquet type conversion
----
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]