I've been trying to track down a problem in Spark code relating to a task with many iterations. When trying to recreate an error with simpler code, I ran into a StackOverflowError due to large lineage. The solution is to add checkpoints, but the behavior of checkpoints is not well defined in the Spark documentation. I have some questions regarding checkpoints when not using Spark Streaming.
1. How can I clean up checkpoints once a new checkpoint has been created? I have heard this is automatically done in Spark Streaming, can this be set up for programs that otherwise need nothing from Spark Streaming? 2. In the code below (runs in spark-shell with master=yarn-client) Is the map (x => x) really necessary to successfully create a checkpoint? sc.setCheckpointDir("checkpoints") var hello = sc.parallelize(Array(1.0, 2.0, 3.0)) hello.checkpoint() println(hello.isCheckpointed) // prints false hello.count() println(hello.isCheckpointed) // prints true val mappedRDD = inRDD.map(x => -x) val result = mappedRDD.reduce(_+_) hello = mappedRDD hello.checkpoint() println(hello.isCheckpointed) // prints false hello.count() println(hello.isCheckpointed) // prints false hello = mappedRDD.map(x => x) hello.checkpoint() println(hello.isCheckpointed) // prints false hello.count() println(hello.isCheckpointed) // prints true 3. Is it possible that a long lineage could cause ActorNotFound errors in the yarn logs without StackOverflow errors in those same logs (in particular, with GraphX code)? Any insight into these problems would be very appreciated. Thanks, Ian