Re: checkpointing without streaming?

Diana Carroll Mon, 21 Apr 2014 12:13:25 -0700

When might that be necessary or useful?  Presumably I can persist and
replicate my RDD to avoid re-computation, if that's my goal.  What
advantage  does checkpointing provide over disk persistence with
replication?



On Mon, Apr 21, 2014 at 2:42 PM, Xiangrui Meng <men...@gmail.com> wrote:

> Checkpoint clears dependencies. You might need checkpoint to cut a
> long lineage in iterative algorithms. -Xiangrui
>
> On Mon, Apr 21, 2014 at 11:34 AM, Diana Carroll <dcarr...@cloudera.com>
> wrote:
> > I'm trying to understand when I would want to checkpoint an RDD rather
> than
> > just persist to disk.
> >
> > Every reference I can find to checkpoint related to Spark Streaming.  But
> > the method is defined in the core Spark library, not Streaming.
> >
> > Does it exist solely for streaming, or are there circumstances unrelated
> to
> > streaming in which I might want to checkpoint...and if so, like what?
> >
> > Thanks,
> > Diana
>

Re: checkpointing without streaming?

Reply via email to