Unfortunately there isn’t a guide, but you can read a PySpark internals 
overview at 
https://cwiki.apache.org/confluence/display/SPARK/PySpark+Internals. This would 
be the thing to follow.

In terms of MLlib and GraphX, I think MLlib will be easier to expose at first — 
it’s designed to be easy to call from Java, and we’ve already created bindings 
for many of the algorithms that connect with NumPy. (A couple of new algorithms 
have been added since then though.) GraphX currently isn’t easy to call from 
Java and will be even harder to deal with in Python. I’d start with a Java API 
for it first.

BTW in both of these we want to call the JVM codebase from Python. That will be 
a lot more efficient than implementing the same code in Python, and more 
maintainable as well.

Matei

On Mar 16, 2014, at 5:59 AM, Krakna H <shankark+...@gmail.com> wrote:

> Is there any documentation on contributing pyspark ports of additions to 
> Spark? I only see guidelines on Scala contributions 
> (https://cwiki.apache.org/confluence/display/SPARK/Contributing+to+Spark). 
> Specifically, I'm interested in porting mllib and graphx contributions.
> 
> View this message in context: Contributing pyspark ports
> Sent from the Apache Spark User List mailing list archive at Nabble.com.

Reply via email to