I have a general question on when persisting will be beneficial and when it won't:
I have a task that runs as follow keyedRecordPieces = records.flatMap( record => Seq(key, recordPieces)) partitoned = keyedRecordPieces.partitionBy(KeyPartitioner) partitoned.mapPartitions(doComputation).save() Is there value in having a persist somewhere here? For example if the flatMap step is particularly expensive, will it ever be computed twice when there are no failures? Thanks Arun