So in order to not incur any performance issues I should really wait for
all usage of the rdd to complete before calling unpersist, correct?

On Wed, Sep 16, 2015 at 4:08 PM, Tathagata Das <tathagata.das1...@gmail.com>
wrote:

> unpredictable. I think it will be safe (as in nothing should fail), but
> the performance will be unpredictable (some partition may use cache, some
> may not be able to use the cache).
>
> On Wed, Sep 16, 2015 at 1:06 PM, Paul Weiss <paulweiss....@gmail.com>
> wrote:
>
>> Hi,
>>
>> What is the behavior when calling rdd.unpersist() from a different thread
>> while another thread is using that rdd.  Below is a simple case for this:
>>
>> 1) create rdd and load data
>> 2) call rdd.cache() to bring data into memory
>> 3) create another thread and pass rdd for a long computation
>> 4) call rdd.unpersist while 3. is still running
>>
>> Questions:
>>
>> * Will the computation in 3) finish properly even if unpersist was called
>> on the rdd while running?
>> * What happens if a part of the computation fails and the rdd needs to
>> reconstruct based on DAG lineage, will this still work even though
>> unpersist has been called?
>>
>> thanks,
>> -paul
>>
>
>

Reply via email to