pandas-like dataframe in spark

2014-09-04 Thread Mohit Jaggi
Folks, I have been working on a pandas-like dataframe DSL on top of spark. It is written in Scala and can be used from spark-shell. The APIs have the look and feel of pandas which is a wildly popular piece of software data scientists use. The goal is to let people familiar with pandas scale their

Re: pandas-like dataframe in spark

2014-09-04 Thread Matei Zaharia
Hi Mohit, This looks pretty interesting, but just a note on the implementation -- it might be worthwhile to try doing this on top of Spark SQL SchemaRDDs. The reason is that SchemaRDDs already have an efficient in-memory representation (columnar storage), and can be read from a variety of data

Re: pandas-like dataframe in spark

2014-09-04 Thread Mohit Jaggi
Thanks Matei. I will take a look at SchemaRDDs. On Thu, Sep 4, 2014 at 11:24 AM, Matei Zaharia matei.zaha...@gmail.com wrote: Hi Mohit, This looks pretty interesting, but just a note on the implementation -- it might be worthwhile to try doing this on top of Spark SQL SchemaRDDs. The