[GitHub] spark pull request: [WIP] SPARK-1416: PySpark support for Sequence...

ahirreddy Sun, 20 Apr 2014 00:32:52 -0700

Github user ahirreddy commented on the pull request:

    https://github.com/apache/spark/pull/455#issuecomment-40889685
  
    I haven't had a chance to look too deeply, but I think Pyrolite can be 
useful. One benefit, and addition to not depending on msg-pack, is that we can 
do the RDD to PythonRDD conversion in the JVM without calling out to Python. In 
the SparkSQL API, I basically convert and RDD of java.util.Maps to Python 
dictionaries. If we can read ```Writable```s into an RDD of java objects, it 
can easily be converted to a PythonRDD.
    
    The list of supported java-python type mappings is listed here: 
https://github.com/irmen/Pyrolite/
    
    Here are the java-to-python and vice versa functions I added in #363:
    
    
https://github.com/apache/spark/pull/363/files?w=1#diff-0a67bc4d171abe4df8eb305b0f4123a2R290
    
https://github.com/apache/spark/pull/363/files?w=1#diff-1b97e54687301e5840bb97e576f83ee6R316




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [WIP] SPARK-1416: PySpark support for Sequence...

Reply via email to