Hi

I have a workflow like below:

rdd1 = sc.textFile(input);
rdd2 = rdd1.filter(filterfunc1);
rdd3 = rdd1.filter(fiterfunc2);
rdd4 = rdd2.map(mapptrans1);
rdd5 = rdd3.map(maptrans2);
rdd6 = rdd4.union(rdd5);
rdd6.foreach(some transformation);

[image: Inline image 1]




   1. Do I need to persist rdd1 ?Or its not required since there is only
   one action at rdd6 which will create only one job and in a single job no
   need of persist ?
   2. Also what if transformation on rdd2 is reduceByKey instead of map ?
   Will this again the same thing no need of persist since single job.


Thanks

Reply via email to