GitHub user holdenk opened a pull request:
https://github.com/apache/spark/pull/11856
[SPARK-12072][PYTHON][SQL][WIP] Python jdf dataframe schema breaks
## What changes were proposed in this pull request?
For large schemas, wrap the Java version of the schema instead of parsing
the JSON.
## How was this patch tested?
Unit tests & manually locally run the SQL test suite wrapping all of the
schemas
This is a WIP PR - hopefully the original creator of the issue can also
verify this fix solves their problem.
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/holdenk/spark
SPARK-12072-python-jdf-dataframe-schema-breaks
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/spark/pull/11856.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #11856
----
commit 7538b7aafc1a7fcc9989a26c9bd042fd4c64147f
Author: Holden Karau <[email protected]>
Date: 2015-12-10T22:23:48Z
A bit of progress playing around
commit 3db6344c09d14a384aac3e7f4a0ed9fe7afedc9e
Author: Holden Karau <[email protected]>
Date: 2015-12-11T00:32:54Z
A bit more progress. Still not sure about this direction
commit 70975220875ca8c558e27f7cf04c4329900ce572
Author: Holden Karau <[email protected]>
Date: 2015-12-11T01:39:26Z
A bit of progress
commit 65b46c243d94fd2fe78ea7728bb1b43fff18d242
Author: Holden Karau <[email protected]>
Date: 2015-12-12T04:23:32Z
Merge branch 'master' into SPARK-12072-python-jdf-dataframe-schema-breaks
commit fe076a14a6f583c640fbfde38863c9414e1b7122
Author: Holden Karau <[email protected]>
Date: 2016-03-16T18:34:08Z
Merge in master
commit e72945d24e5f10cf1fb709db020b848783fb35ed
Author: Holden Karau <[email protected]>
Date: 2016-03-17T20:30:21Z
Merge branch 'master' into SPARK-12072-python-jdf-dataframe-schema-breaks
commit 8ed0966e0903ea7b01f115283424480f29f6a4f4
Author: Holden Karau <[email protected]>
Date: 2016-03-17T22:08:23Z
Some more progress (allow creating of DataFrames using the WrappedSchema by
converting it back to Python schema which isn't great but works)
commit fbdd3a651356483b3d0cc700e0eb977201709b5b
Author: Holden Karau <[email protected]>
Date: 2016-03-17T22:12:33Z
Fix extracting Java schema
commit 15d6f9516b63cdca1f55c06d0db5f5a199593855
Author: Holden Karau <[email protected]>
Date: 2016-03-18T01:15:28Z
Add a generateSafeToInternal function so that we can (optionally) get a
safe to serialize toInternal converter for the wrapped Java StructType.
commit 77afc9976f54143c2cd5e2465bcc47f65c079cc0
Author: Holden Karau <[email protected]>
Date: 2016-03-18T01:30:24Z
Style fixes and fix ref to field
commit 6e6df515bf476af5111295c1204fd29f4d0901b6
Author: Holden Karau <[email protected]>
Date: 2016-03-18T07:13:35Z
Don't fetch the full field when determining if need serializae or
toInternal (saves us fetching metadata)
commit fb6686edc84eab0c0fcec13416238e8353b15b89
Author: Holden Karau <[email protected]>
Date: 2016-03-19T22:32:36Z
Some progress on testing
commit ac9bec2c94a794b00ee70157c09c9c2b2b9850ff
Author: Holden Karau <[email protected]>
Date: 2016-03-20T07:49:27Z
Add the equality testing stuff and such
commit c920e19b0144f83acbb40ed48f74300e0f2429c6
Author: Holden Karau <[email protected]>
Date: 2016-03-20T08:16:39Z
Style fixes
----
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]