Re: Python implementation of RDD interface

2015-05-29 Thread Davies Liu
DPark also can work in localhost without Mesos cluster (single thread or multiple process). I also think that running PySpark without JVM in local mode will help develop, so both pysparkling and DPark are both useful. On Fri, May 29, 2015 at 1:36 PM, Sven Kreiss wrote: > I have to admit that I n

Re: Python implementation of RDD interface

2015-05-29 Thread Sven Kreiss
I have to admit that I never ran DPark. I think the goals are very different. The purpose of pysparkling is not to reproduce Spark on a cluster, but to have a lightweight implementation with the same interface to run locally or on an API server. I still run PySpark on a cluster to preprocess a larg

Re: Python implementation of RDD interface

2015-05-29 Thread Davies Liu
There is another implementation of RDD interface in Python, called DPark [1], Could you have a few words to compare these two? [1] https://github.com/douban/dpark/ On Fri, May 29, 2015 at 8:29 AM, Sven Kreiss wrote: > I wanted to share a Python implementation of RDDs: pysparkling. > > http://tri