Re: RDD cache question

Mark Hamstra Sat, 30 Nov 2013 18:25:39 -0800

Your question doesn't really make any sense without specifying where any
RDD actions take place (i.e. where Spark jobs are actually run.)  Without
any actions, all you've outlined so far are different ways to specify the
chain of transformations that should be evaluated when an action is
eventually called and a job runs.  In a real sense your code hasn't
actually done anything yet.



On Sat, Nov 30, 2013 at 6:01 PM, Yadid Ayzenberg <[email protected]>wrote:

>
>
>
> Hi All,
>
> Im trying to implement the following and would like to know in which
> places I should be calling RDD.cache():
>
> Suppose I have a group of RDDs : RDD1 to RDDn as input.
>
> 1. create a single RDD_total = RDD1.union(RDD2)..union(RDDn)
>
> 2. for i = 0 to x:    RDD_total = RDD_total.map (some map function());
>
> 3. return RDD_total.
>
> I that I should cache RDD total in order to optimize the iterations.
> Should I just be calling RDD_total.cache() at the end of each iteration ?
> or should I be preforming something more elaborate:
>
>
> RDD_temp = RDD_total.map (some map function());
> RDD_total.unpersist();
> RDD_total = RDD_temp.cache();
>
>
>
> Thanks,
> Yadid
>
>
>
>
>
>
>

Re: RDD cache question

Reply via email to