I have a general question on when persisting will be beneficial and when it
won't:

I have a task that runs as follow

keyedRecordPieces  = records.flatMap( record => Seq(key, recordPieces))
partitoned = keyedRecordPieces.partitionBy(KeyPartitioner)

partitoned.mapPartitions(doComputation).save()

Is there value in having a persist somewhere here?  For example if the
flatMap step is particularly expensive, will it ever be computed twice when
there are no failures?

Thanks

Arun

Reply via email to