efficient zipping of lots of RDDs

Mohit Jaggi Thu, 04 Sep 2014 09:36:59 -0700

Folks,
I sent an email announcing
https://github.com/AyasdiOpenSource/df


This dataframe is basically a map of RDDs of columns(along with DSL sugar),
as column based operations seem to be most common. But row operations are
not uncommon. To get rows out of columns right now I zip the column RDDs
together. I use RDD.zip then flatten the tuples I get. I realize that
RDD.zipPartitions might be faster. However, I believe an even better
approach should be possible. Surely we can have a zip method that can
combine a large variable number of RDDs? Can that be added to Spark-core?
Or is there an alternative equally good or better approach?

Cheers,
Mohit.

efficient zipping of lots of RDDs

Reply via email to