Split RDD along columns

Schein, Sagi Thu, 29 Jan 2015 04:41:00 -0800

Hi,

I have the following usecase, assuming that I have my data in e.g. hdfs, a 
single file sequence file containing rows of CSV entries that I can split and 
build an RDD of arrays of (smaller) strings.
What I want to do is to build two RDDs where the first RDD contains a subset of 
columns and the second RDD contains another subset.
Is there a map like API that could do this trick ?


BTW - I know that one can iteratively build multiple flows that would call map 
and select the proper columns. Is there any faster way ?

Sagi

Split RDD along columns

Reply via email to