When might that be necessary or useful? Presumably I can persist and replicate my RDD to avoid re-computation, if that's my goal. What advantage does checkpointing provide over disk persistence with replication?
On Mon, Apr 21, 2014 at 2:42 PM, Xiangrui Meng <men...@gmail.com> wrote: > Checkpoint clears dependencies. You might need checkpoint to cut a > long lineage in iterative algorithms. -Xiangrui > > On Mon, Apr 21, 2014 at 11:34 AM, Diana Carroll <dcarr...@cloudera.com> > wrote: > > I'm trying to understand when I would want to checkpoint an RDD rather > than > > just persist to disk. > > > > Every reference I can find to checkpoint related to Spark Streaming. But > > the method is defined in the core Spark library, not Streaming. > > > > Does it exist solely for streaming, or are there circumstances unrelated > to > > streaming in which I might want to checkpoint...and if so, like what? > > > > Thanks, > > Diana >