GitHub user davies opened a pull request:

    https://github.com/apache/spark/pull/7301

    [SPARK-]  Refactor of serialization for Python DataFrame 

    This PR fix the long standing issue of serialization between Python RDD and 
DataFrame, it change to using a customized Pickler for InternalRow to enable 
customized unpickling (type conversion, especially for UDT), now we can support 
UDT for UDF, cc @mengxr .
    
    There is no generated `Row` anymore.
    
    TODO: improve performance for row.col for many columns.
    
    This is based on #7131, will rebase after that is merged.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/davies/spark sql_ser

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/7301.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #7301
    
----
commit c46814a974b15862cb483e7104a7232ec9af9cd5
Author: Davies Liu <[email protected]>
Date:   2015-06-30T19:20:05Z

    convert decimal for Python DataFrames

commit c99e8c5e6ec81ade36827f5ed745c159f5d5cdac
Author: Davies Liu <[email protected]>
Date:   2015-06-30T20:18:10Z

    fix mima

commit 829a05be36b64ff08486e48e83d5f26f1c919907
Author: Davies Liu <[email protected]>
Date:   2015-06-30T23:28:27Z

    fix UDT in python

commit 9cd5a213b5762e6e3fe27d1ae1f61dee7f2ee798
Author: Davies Liu <[email protected]>
Date:   2015-06-30T23:36:15Z

    run python tests with SPARK_PREPEND_CLASSES

commit 7104e97f01f47c6405bc8f8e51b5485ddca27efe
Author: Davies Liu <[email protected]>
Date:   2015-07-01T21:40:34Z

    improve type infer

commit 6cdd86a68bfa698455d65dae3d718faa29e0e4e0
Author: Davies Liu <[email protected]>
Date:   2015-07-01T21:51:57Z

    Merge branch 'master' of github.com:apache/spark into decimal_python

commit 7d73168de9b45cb15229f9fe2e7e97304aa8c375
Author: Davies Liu <[email protected]>
Date:   2015-07-01T21:53:17Z

    fix conflit

commit 20531d60c49519d8d4a5680c1d421f08f99bfb43
Author: Davies Liu <[email protected]>
Date:   2015-07-07T00:16:51Z

    Merge branch 'master' of github.com:apache/spark into decimal_python
    
    Conflicts:
        project/MimaExcludes.scala

commit 0972c07007cc798bfac2be94b90e57f84a7d5408
Author: Davies Liu <[email protected]>
Date:   2015-07-08T22:01:53Z

    Refactor of serialization for Python DataFrame

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to