Hi Matei,

Thank you for clarification. I agree that users can always call
rdd.unpersist() to evict RDD's data from BlockManager. But if the rdd
object becomes unreachable, then its data in BlockManager is unusable.
Ideally, BlockManager should evict unusable data before any other
data. It is like a memory leak in BlockManager but because of the LRU
policy, data will eventually get evicted from BlockManager.

Is this a big deal? If yes, I can send a pull request to fix it.

Thanks,
Meisam

On Wed, Nov 13, 2013 at 11:37 PM, Matei Zaharia <[email protected]> wrote:
> Hi Meisam,
>
> Each block manager removes data from the cache in a least-recently-used 
> fashion as space fills up. If you’d like to remove an RDD manually before 
> that, you can call rdd.unpersist().
>
> Matei
>
> On Nov 13, 2013, at 8:15 PM, Meisam Fathi <[email protected]> wrote:
>
>> Hi Community,
>>
>> When an RDD in the application becomes unreachable and gets garbage
>> collected, how does Spark remove RDD's data from BlockManagers on the
>> worker nodes?
>>
>> Thanks,
>> Meisam
>

Reply via email to