When might that be necessary or useful?  Presumably I can persist and
replicate my RDD to avoid re-computation, if that's my goal.  What
advantage  does checkpointing provide over disk persistence with
replication?


On Mon, Apr 21, 2014 at 2:42 PM, Xiangrui Meng <men...@gmail.com> wrote:

> Checkpoint clears dependencies. You might need checkpoint to cut a
> long lineage in iterative algorithms. -Xiangrui
>
> On Mon, Apr 21, 2014 at 11:34 AM, Diana Carroll <dcarr...@cloudera.com>
> wrote:
> > I'm trying to understand when I would want to checkpoint an RDD rather
> than
> > just persist to disk.
> >
> > Every reference I can find to checkpoint related to Spark Streaming.  But
> > the method is defined in the core Spark library, not Streaming.
> >
> > Does it exist solely for streaming, or are there circumstances unrelated
> to
> > streaming in which I might want to checkpoint...and if so, like what?
> >
> > Thanks,
> > Diana
>

Reply via email to