Hi All,
Im trying to implement the following and would like to know in which places I
should be calling RDD.cache():
Suppose I have a group of RDDs : RDD1 to RDDn as input.
1. create a single RDD_total = RDD1.union(RDD2)..union(RDDn)
2. for i = 0 to x: RDD_total = RDD_total.map (some map function());
3. return RDD_total.
I that I should cache RDD total in order to optimize the iterations. Should I
just be calling RDD_total.cache() at the end of each iteration ? or should I be
preforming something more elaborate:
RDD_temp = RDD_total.map (some map function());
RDD_total.unpersist();
RDD_total = RDD_temp.cache();
Thanks,
Yadid