Hi All,

Im trying to implement the following and would like to know in which places I 
should be calling RDD.cache():

Suppose I have a group of RDDs : RDD1 to RDDn as input.

1. create a single RDD_total = RDD1.union(RDD2)..union(RDDn)

2. for i = 0 to x:    RDD_total = RDD_total.map (some map function());

3. return RDD_total.

I that I should cache RDD total in order to optimize the iterations. Should I 
just be calling RDD_total.cache() at the end of each iteration ? or should I be 
preforming something more elaborate:


RDD_temp = RDD_total.map (some map function());
RDD_total.unpersist();
RDD_total = RDD_temp.cache();



Thanks,
Yadid






Reply via email to