Re: What's the lifecycle of an rdd? Can I control it?

2014-03-19 Thread hequn cheng
persist and unpersist.
unpersist:Mark the RDD as non-persistent, and remove all blocks for it from
memory and disk


2014-03-19 16:40 GMT+08:00 林武康 vboylin1...@gmail.com:

  Hi, can any one tell me about the lifecycle of an rdd? I search through
 the official website and still can't figure it out. Can I use an rdd in
 some stages and destroy it in order to release memory because that no
 stages ahead will use this rdd any more. Is it possible?

 Thanks!

 Sincerely
 Lin wukang



Re: What's the lifecycle of an rdd? Can I control it?

2014-03-19 Thread Nicholas Chammas
Related question:

If I keep creating new RDDs and cache()-ing them, does Spark automatically
unpersist the least recently used RDD when it runs out of memory? Or is an
explicit unpersist the only way to get rid of an RDD (barring the PR
Tathagata mentioned)?

Also, does unpersist()-ing an RDD immediately free up space, or just allow
that space to be reclaimed when needed?


On Wed, Mar 19, 2014 at 7:01 PM, Tathagata Das
tathagata.das1...@gmail.comwrote:

 Just a head's up, there is an active
 https://github.com/apache/spark/pull/126*pull requeust* that will
 automatically unpersist RDDs that are not in reference/scope from the
 application any more.

 TD


 On Wed, Mar 19, 2014 at 6:58 PM, hequn cheng chenghe...@gmail.com wrote:

 persist and unpersist.
 unpersist:Mark the RDD as non-persistent, and remove all blocks for it
 from memory and disk


 2014-03-19 16:40 GMT+08:00 林武康 vboylin1...@gmail.com:

   Hi, can any one tell me about the lifecycle of an rdd? I search
 through the official website and still can't figure it out. Can I use an
 rdd in some stages and destroy it in order to release memory because that
 no stages ahead will use this rdd any more. Is it possible?

 Thanks!

 Sincerely
 Lin wukang






Re: What's the lifecycle of an rdd? Can I control it?

2014-03-19 Thread Matei Zaharia
Yes, Spark automatically removes old RDDs from the cache when you make new 
ones. Unpersist forces it to remove them right away. In both cases though, note 
that Java doesn’t garbage-collect the objects released until later.

Matei

On Mar 19, 2014, at 7:22 PM, Nicholas Chammas nicholas.cham...@gmail.com 
wrote:

 Related question: 
 
 If I keep creating new RDDs and cache()-ing them, does Spark automatically 
 unpersist the least recently used RDD when it runs out of memory? Or is an 
 explicit unpersist the only way to get rid of an RDD (barring the PR 
 Tathagata mentioned)?
 
 Also, does unpersist()-ing an RDD immediately free up space, or just allow 
 that space to be reclaimed when needed?
 
 
 On Wed, Mar 19, 2014 at 7:01 PM, Tathagata Das tathagata.das1...@gmail.com 
 wrote:
 Just a head's up, there is an active pull requeust that will automatically 
 unpersist RDDs that are not in reference/scope from the application any more. 
 
 TD
 
 
 On Wed, Mar 19, 2014 at 6:58 PM, hequn cheng chenghe...@gmail.com wrote:
 persist and unpersist.
 unpersist:Mark the RDD as non-persistent, and remove all blocks for it from 
 memory and disk
 
 
 2014-03-19 16:40 GMT+08:00 林武康 vboylin1...@gmail.com:
 
 Hi, can any one tell me about the lifecycle of an rdd? I search through the 
 official website and still can't figure it out. Can I use an rdd in some 
 stages and destroy it in order to release memory because that no stages ahead 
 will use this rdd any more. Is it possible?
 
 Thanks!
 
 Sincerely 
 Lin wukang
 
 
 



Re: What's the lifecycle of an rdd? Can I control it?

2014-03-19 Thread Nicholas Chammas
Okie doke, good to know.


On Wed, Mar 19, 2014 at 7:35 PM, Matei Zaharia matei.zaha...@gmail.comwrote:

 Yes, Spark automatically removes old RDDs from the cache when you make new
 ones. Unpersist forces it to remove them right away. In both cases though,
 note that Java doesn’t garbage-collect the objects released until later.

 Matei

 On Mar 19, 2014, at 7:22 PM, Nicholas Chammas nicholas.cham...@gmail.com
 wrote:

 Related question:

 If I keep creating new RDDs and cache()-ing them, does Spark automatically
 unpersist the least recently used RDD when it runs out of memory? Or is an
 explicit unpersist the only way to get rid of an RDD (barring the PR
 Tathagata mentioned)?

 Also, does unpersist()-ing an RDD immediately free up space, or just allow
 that space to be reclaimed when needed?


 On Wed, Mar 19, 2014 at 7:01 PM, Tathagata Das 
 tathagata.das1...@gmail.com wrote:

 Just a head's up, there is an active
 https://github.com/apache/spark/pull/126*pull requeust* that will
 automatically unpersist RDDs that are not in reference/scope from the
 application any more.

 TD


 On Wed, Mar 19, 2014 at 6:58 PM, hequn cheng chenghe...@gmail.comwrote:

 persist and unpersist.
 unpersist:Mark the RDD as non-persistent, and remove all blocks for it
 from memory and disk


 2014-03-19 16:40 GMT+08:00 林武康 vboylin1...@gmail.com:

   Hi, can any one tell me about the lifecycle of an rdd? I search
 through the official website and still can't figure it out. Can I use an
 rdd in some stages and destroy it in order to release memory because that
 no stages ahead will use this rdd any more. Is it possible?

 Thanks!

 Sincerely
 Lin wukang