So in order to not incur any performance issues I should really wait for all usage of the rdd to complete before calling unpersist, correct?
On Wed, Sep 16, 2015 at 4:08 PM, Tathagata Das <tathagata.das1...@gmail.com> wrote: > unpredictable. I think it will be safe (as in nothing should fail), but > the performance will be unpredictable (some partition may use cache, some > may not be able to use the cache). > > On Wed, Sep 16, 2015 at 1:06 PM, Paul Weiss <paulweiss....@gmail.com> > wrote: > >> Hi, >> >> What is the behavior when calling rdd.unpersist() from a different thread >> while another thread is using that rdd. Below is a simple case for this: >> >> 1) create rdd and load data >> 2) call rdd.cache() to bring data into memory >> 3) create another thread and pass rdd for a long computation >> 4) call rdd.unpersist while 3. is still running >> >> Questions: >> >> * Will the computation in 3) finish properly even if unpersist was called >> on the rdd while running? >> * What happens if a part of the computation fails and the rdd needs to >> reconstruct based on DAG lineage, will this still work even though >> unpersist has been called? >> >> thanks, >> -paul >> > >