Folks,
I have been working on a pandas-like dataframe DSL on top of spark. It is
written in Scala and can be used from spark-shell. The APIs have the look
and feel of pandas which is a wildly popular piece of software data
scientists use. The goal is to let people familiar with pandas scale their
Hi Mohit,
This looks pretty interesting, but just a note on the implementation -- it
might be worthwhile to try doing this on top of Spark SQL SchemaRDDs. The
reason is that SchemaRDDs already have an efficient in-memory representation
(columnar storage), and can be read from a variety of data
Thanks Matei. I will take a look at SchemaRDDs.
On Thu, Sep 4, 2014 at 11:24 AM, Matei Zaharia matei.zaha...@gmail.com
wrote:
Hi Mohit,
This looks pretty interesting, but just a note on the implementation -- it
might be worthwhile to try doing this on top of Spark SQL SchemaRDDs. The