Hi All, I have a doubt about checkpointing and persist/saving.
Say we have one RDD - containing huge data, 1. We checkpoint and perform join 2. We persist as StorageLevel.MEMORY_AND_DISK and perform join 3. We save that intermediate RDD and perform join (using same RDD - saving is to just persist intermediate result before joining) Which of the above is faster and whats the difference? Thanks, Subash