Hi, I have the following usecase, assuming that I have my data in e.g. hdfs, a single file sequence file containing rows of CSV entries that I can split and build an RDD of arrays of (smaller) strings. What I want to do is to build two RDDs where the first RDD contains a subset of columns and the second RDD contains another subset. Is there a map like API that could do this trick ?
BTW - I know that one can iteratively build multiple flows that would call map and select the proper columns. Is there any faster way ? Sagi