Folks, I sent an email announcing https://github.com/AyasdiOpenSource/df
This dataframe is basically a map of RDDs of columns(along with DSL sugar), as column based operations seem to be most common. But row operations are not uncommon. To get rows out of columns right now I zip the column RDDs together. I use RDD.zip then flatten the tuples I get. I realize that RDD.zipPartitions might be faster. However, I believe an even better approach should be possible. Surely we can have a zip method that can combine a large variable number of RDDs? Can that be added to Spark-core? Or is there an alternative equally good or better approach? Cheers, Mohit.