For example, val originalRDD: RDD[SomeCaseClass] = ...
// Option 1: objects are copied, setting prop1 in the process val transformedRDD = originalRDD.map( item => item.copy(prop1 = calculation() ) // Option 2: objects are re-used and modified val tranformedRDD = originalRDD.map( item => item.prop1 = calculation() ) I did a couple of small tests with option 2 and noticed less time was spent in garbage collection. It didn't add up to much but with a large enough data set it would make a difference. Also, it seems that less memory would be used. Potential gotchas: - Objects in originalRDD are being modified, so you can't expect them to have not changed - You also can't rely on objects in originalRDD having the new value because originalRDD might be re-caclulated - If originalRDD was a PairRDD, and you modified the keys, it could cause issues - more? Other than the potential gotchas, is there any reason not to reuse objects across RDD's? Is it a recommended practice for reducing memory usage and garbage collection or not? Is it safe to do this in code you expect to work on future versions of Spark? Thanks in advance, Todd