GitHub user davies opened a pull request:
https://github.com/apache/spark/pull/7301
[SPARK-] Refactor of serialization for Python DataFrame
This PR fix the long standing issue of serialization between Python RDD and
DataFrame, it change to using a customized Pickler for InternalRow to enable
customized unpickling (type conversion, especially for UDT), now we can support
UDT for UDF, cc @mengxr .
There is no generated `Row` anymore.
TODO: improve performance for row.col for many columns.
This is based on #7131, will rebase after that is merged.
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/davies/spark sql_ser
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/spark/pull/7301.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #7301
----
commit c46814a974b15862cb483e7104a7232ec9af9cd5
Author: Davies Liu <[email protected]>
Date: 2015-06-30T19:20:05Z
convert decimal for Python DataFrames
commit c99e8c5e6ec81ade36827f5ed745c159f5d5cdac
Author: Davies Liu <[email protected]>
Date: 2015-06-30T20:18:10Z
fix mima
commit 829a05be36b64ff08486e48e83d5f26f1c919907
Author: Davies Liu <[email protected]>
Date: 2015-06-30T23:28:27Z
fix UDT in python
commit 9cd5a213b5762e6e3fe27d1ae1f61dee7f2ee798
Author: Davies Liu <[email protected]>
Date: 2015-06-30T23:36:15Z
run python tests with SPARK_PREPEND_CLASSES
commit 7104e97f01f47c6405bc8f8e51b5485ddca27efe
Author: Davies Liu <[email protected]>
Date: 2015-07-01T21:40:34Z
improve type infer
commit 6cdd86a68bfa698455d65dae3d718faa29e0e4e0
Author: Davies Liu <[email protected]>
Date: 2015-07-01T21:51:57Z
Merge branch 'master' of github.com:apache/spark into decimal_python
commit 7d73168de9b45cb15229f9fe2e7e97304aa8c375
Author: Davies Liu <[email protected]>
Date: 2015-07-01T21:53:17Z
fix conflit
commit 20531d60c49519d8d4a5680c1d421f08f99bfb43
Author: Davies Liu <[email protected]>
Date: 2015-07-07T00:16:51Z
Merge branch 'master' of github.com:apache/spark into decimal_python
Conflicts:
project/MimaExcludes.scala
commit 0972c07007cc798bfac2be94b90e57f84a7d5408
Author: Davies Liu <[email protected]>
Date: 2015-07-08T22:01:53Z
Refactor of serialization for Python DataFrame
----
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]