RDD cache question

Yadid Ayzenberg Sat, 30 Nov 2013 18:02:59 -0800



Hi All,

Im trying to implement the following and would like to know in which places I 
should be calling RDD.cache():

Suppose I have a group of RDDs : RDD1 to RDDn as input.

1. create a single RDD_total = RDD1.union(RDD2)..union(RDDn)

2. for i = 0 to x:    RDD_total = RDD_total.map (some map function());

3. return RDD_total.

I that I should cache RDD total in order to optimize the iterations. Should I 
just be calling RDD_total.cache() at the end of each iteration ? or should I be 
preforming something more elaborate:


RDD_temp = RDD_total.map (some map function());
RDD_total.unpersist();
RDD_total = RDD_temp.cache();



Thanks,
Yadid

RDD cache question

Reply via email to