saving partitions separately

Vipul Pandey Thu, 06 Feb 2014 23:59:56 -0800

Hi,

I have a dataset, which after a few transformation takes the following shape :


org.apache.spark.rdd.RDD[String,(String,Double)]

And there are just a handful of possible keys ( <100) . What I want to do is 
save data for each key in separate files. One way to do that is to filter the 
RDD as many times for each key in a loop and save each Filtered RDD separately. 
I was wondering if there is a direct way of doing this? May be repartitioning 
based on the key somehow? or grouping by keys? but then how do we save each 
separately without looping through. 

Thanks,
Vipul

saving partitions separately

Reply via email to