Re: What's the lifecycle of an rdd? Can I control it?
persist and unpersist. unpersist:Mark the RDD as non-persistent, and remove all blocks for it from memory and disk 2014-03-19 16:40 GMT+08:00 林武康 vboylin1...@gmail.com: Hi, can any one tell me about the lifecycle of an rdd? I search through the official website and still can't figure it out. Can I use an rdd in some stages and destroy it in order to release memory because that no stages ahead will use this rdd any more. Is it possible? Thanks! Sincerely Lin wukang
Re: What's the lifecycle of an rdd? Can I control it?
Related question: If I keep creating new RDDs and cache()-ing them, does Spark automatically unpersist the least recently used RDD when it runs out of memory? Or is an explicit unpersist the only way to get rid of an RDD (barring the PR Tathagata mentioned)? Also, does unpersist()-ing an RDD immediately free up space, or just allow that space to be reclaimed when needed? On Wed, Mar 19, 2014 at 7:01 PM, Tathagata Das tathagata.das1...@gmail.comwrote: Just a head's up, there is an active https://github.com/apache/spark/pull/126*pull requeust* that will automatically unpersist RDDs that are not in reference/scope from the application any more. TD On Wed, Mar 19, 2014 at 6:58 PM, hequn cheng chenghe...@gmail.com wrote: persist and unpersist. unpersist:Mark the RDD as non-persistent, and remove all blocks for it from memory and disk 2014-03-19 16:40 GMT+08:00 林武康 vboylin1...@gmail.com: Hi, can any one tell me about the lifecycle of an rdd? I search through the official website and still can't figure it out. Can I use an rdd in some stages and destroy it in order to release memory because that no stages ahead will use this rdd any more. Is it possible? Thanks! Sincerely Lin wukang
Re: What's the lifecycle of an rdd? Can I control it?
Yes, Spark automatically removes old RDDs from the cache when you make new ones. Unpersist forces it to remove them right away. In both cases though, note that Java doesn’t garbage-collect the objects released until later. Matei On Mar 19, 2014, at 7:22 PM, Nicholas Chammas nicholas.cham...@gmail.com wrote: Related question: If I keep creating new RDDs and cache()-ing them, does Spark automatically unpersist the least recently used RDD when it runs out of memory? Or is an explicit unpersist the only way to get rid of an RDD (barring the PR Tathagata mentioned)? Also, does unpersist()-ing an RDD immediately free up space, or just allow that space to be reclaimed when needed? On Wed, Mar 19, 2014 at 7:01 PM, Tathagata Das tathagata.das1...@gmail.com wrote: Just a head's up, there is an active pull requeust that will automatically unpersist RDDs that are not in reference/scope from the application any more. TD On Wed, Mar 19, 2014 at 6:58 PM, hequn cheng chenghe...@gmail.com wrote: persist and unpersist. unpersist:Mark the RDD as non-persistent, and remove all blocks for it from memory and disk 2014-03-19 16:40 GMT+08:00 林武康 vboylin1...@gmail.com: Hi, can any one tell me about the lifecycle of an rdd? I search through the official website and still can't figure it out. Can I use an rdd in some stages and destroy it in order to release memory because that no stages ahead will use this rdd any more. Is it possible? Thanks! Sincerely Lin wukang
Re: What's the lifecycle of an rdd? Can I control it?
Okie doke, good to know. On Wed, Mar 19, 2014 at 7:35 PM, Matei Zaharia matei.zaha...@gmail.comwrote: Yes, Spark automatically removes old RDDs from the cache when you make new ones. Unpersist forces it to remove them right away. In both cases though, note that Java doesn’t garbage-collect the objects released until later. Matei On Mar 19, 2014, at 7:22 PM, Nicholas Chammas nicholas.cham...@gmail.com wrote: Related question: If I keep creating new RDDs and cache()-ing them, does Spark automatically unpersist the least recently used RDD when it runs out of memory? Or is an explicit unpersist the only way to get rid of an RDD (barring the PR Tathagata mentioned)? Also, does unpersist()-ing an RDD immediately free up space, or just allow that space to be reclaimed when needed? On Wed, Mar 19, 2014 at 7:01 PM, Tathagata Das tathagata.das1...@gmail.com wrote: Just a head's up, there is an active https://github.com/apache/spark/pull/126*pull requeust* that will automatically unpersist RDDs that are not in reference/scope from the application any more. TD On Wed, Mar 19, 2014 at 6:58 PM, hequn cheng chenghe...@gmail.comwrote: persist and unpersist. unpersist:Mark the RDD as non-persistent, and remove all blocks for it from memory and disk 2014-03-19 16:40 GMT+08:00 林武康 vboylin1...@gmail.com: Hi, can any one tell me about the lifecycle of an rdd? I search through the official website and still can't figure it out. Can I use an rdd in some stages and destroy it in order to release memory because that no stages ahead will use this rdd any more. Is it possible? Thanks! Sincerely Lin wukang