I am writing a Spark application that has many iterations.
I am planning to checkpoint on every Nth iteration to cut the graph of my
rdd and clear previous shuffle files.
I would also like to be able to restart my application completely using the
last checkpoint.
I understand that regular checkpoi
Hi,
I am writing an app in Spark ( 1.6.1 ) in which I am using an accumulator.
My accumulator is simply counting rows: acc += 1.
My test processes 4 files each with 4 rows however the value of the
accumulator in the end is not 16 and even worse is inconsistent between
runs.
Are accumulators not t