Hi I have a workflow like below:
rdd1 = sc.textFile(input); rdd2 = rdd1.filter(filterfunc1); rdd3 = rdd1.filter(fiterfunc2); rdd4 = rdd2.map(mapptrans1); rdd5 = rdd3.map(maptrans2); rdd6 = rdd4.union(rdd5); rdd6.foreach(some transformation); [image: Inline image 1] 1. Do I need to persist rdd1 ?Or its not required since there is only one action at rdd6 which will create only one job and in a single job no need of persist ? 2. Also what if transformation on rdd2 is reduceByKey instead of map ? Will this again the same thing no need of persist since single job. Thanks