Using RDD.checkpoint to recover app failure

2016-07-21 Thread harelglik
I am writing a Spark application that has many iterations. I am planning to checkpoint on every Nth iteration to cut the graph of my rdd and clear previous shuffle files. I would also like to be able to restart my application completely using the last checkpoint. I understand that regular checkpoi

Using accumulators in Local mode for testing

2016-07-11 Thread harelglik
Hi, I am writing an app in Spark ( 1.6.1 ) in which I am using an accumulator. My accumulator is simply counting rows: acc += 1. My test processes 4 files each with 4 rows however the value of the accumulator in the end is not 16 and even worse is inconsistent between runs. Are accumulators not t