GitHub user holdenk opened a pull request:

    https://github.com/apache/spark/pull/11856

    [SPARK-12072][PYTHON][SQL][WIP] Python jdf dataframe schema breaks

    ## What changes were proposed in this pull request?
    
    For large schemas, wrap the Java version of the schema instead of parsing 
the JSON.
    
    ## How was this patch tested?
    
    Unit tests & manually locally run the SQL test suite wrapping all of the 
schemas
    
    This is a WIP PR - hopefully the original creator of the issue can also 
verify this fix solves their problem.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/holdenk/spark 
SPARK-12072-python-jdf-dataframe-schema-breaks

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/11856.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #11856
    
----
commit 7538b7aafc1a7fcc9989a26c9bd042fd4c64147f
Author: Holden Karau <[email protected]>
Date:   2015-12-10T22:23:48Z

    A bit of progress playing around

commit 3db6344c09d14a384aac3e7f4a0ed9fe7afedc9e
Author: Holden Karau <[email protected]>
Date:   2015-12-11T00:32:54Z

    A bit more progress. Still not sure about this direction

commit 70975220875ca8c558e27f7cf04c4329900ce572
Author: Holden Karau <[email protected]>
Date:   2015-12-11T01:39:26Z

    A bit of progress

commit 65b46c243d94fd2fe78ea7728bb1b43fff18d242
Author: Holden Karau <[email protected]>
Date:   2015-12-12T04:23:32Z

    Merge branch 'master' into SPARK-12072-python-jdf-dataframe-schema-breaks

commit fe076a14a6f583c640fbfde38863c9414e1b7122
Author: Holden Karau <[email protected]>
Date:   2016-03-16T18:34:08Z

    Merge in master

commit e72945d24e5f10cf1fb709db020b848783fb35ed
Author: Holden Karau <[email protected]>
Date:   2016-03-17T20:30:21Z

    Merge branch 'master' into SPARK-12072-python-jdf-dataframe-schema-breaks

commit 8ed0966e0903ea7b01f115283424480f29f6a4f4
Author: Holden Karau <[email protected]>
Date:   2016-03-17T22:08:23Z

    Some more progress (allow creating of DataFrames using the WrappedSchema by 
converting it back to Python schema which isn't great but works)

commit fbdd3a651356483b3d0cc700e0eb977201709b5b
Author: Holden Karau <[email protected]>
Date:   2016-03-17T22:12:33Z

    Fix extracting Java schema

commit 15d6f9516b63cdca1f55c06d0db5f5a199593855
Author: Holden Karau <[email protected]>
Date:   2016-03-18T01:15:28Z

    Add a generateSafeToInternal function so that we can (optionally) get a 
safe to serialize toInternal converter for the wrapped Java StructType.

commit 77afc9976f54143c2cd5e2465bcc47f65c079cc0
Author: Holden Karau <[email protected]>
Date:   2016-03-18T01:30:24Z

    Style fixes and fix ref to field

commit 6e6df515bf476af5111295c1204fd29f4d0901b6
Author: Holden Karau <[email protected]>
Date:   2016-03-18T07:13:35Z

    Don't fetch the full field when determining if need serializae or 
toInternal (saves us fetching metadata)

commit fb6686edc84eab0c0fcec13416238e8353b15b89
Author: Holden Karau <[email protected]>
Date:   2016-03-19T22:32:36Z

    Some progress on testing

commit ac9bec2c94a794b00ee70157c09c9c2b2b9850ff
Author: Holden Karau <[email protected]>
Date:   2016-03-20T07:49:27Z

    Add the equality testing stuff and such

commit c920e19b0144f83acbb40ed48f74300e0f2429c6
Author: Holden Karau <[email protected]>
Date:   2016-03-20T08:16:39Z

    Style fixes

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to