Re: how to clean shuffle write each iteration

2015-03-03 Thread lisendong
in  ALS, I guess all the iteration’s rdds are referenced by its next 
iteration’s rdd, so all the shuffle data will not be deleted until the als job 
finished…

I guess checkpoint could solve my problem, do you know checkpoint?

 在 2015年3月3日,下午4:18,nitin [via Apache Spark User List] 
 ml-node+s1001560n21889...@n3.nabble.com 写道:
 
 Shuffle write will be cleaned if it is not referenced by any object 
 directly/indirectly. There is a garbage collector written inside spark which 
 periodically checks for weak references to RDDs/shuffle write/broadcast and 
 deletes them. 
 
 If you reply to this email, your message will be added to the discussion 
 below:
 http://apache-spark-user-list.1001560.n3.nabble.com/how-to-clean-shuffle-write-each-iteration-tp21886p21889.html
  
 http://apache-spark-user-list.1001560.n3.nabble.com/how-to-clean-shuffle-write-each-iteration-tp21886p21889.html
 To unsubscribe from how to clean shuffle write each iteration, click here 
 http://apache-spark-user-list.1001560.n3.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_codenode=21886code=bGlzZW5kb25nQDE2My5jb218MjE4ODZ8MjQ0MTU2NDA4.
 NAML 
 http://apache-spark-user-list.1001560.n3.nabble.com/template/NamlServlet.jtp?macro=macro_viewerid=instant_html%21nabble%3Aemail.namlbase=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespacebreadcrumbs=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml




--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/how-to-clean-shuffle-write-each-iteration-tp21886p21890.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

Re: how to clean shuffle write each iteration

2015-03-03 Thread nitin
Shuffle write will be cleaned if it is not referenced by any object
directly/indirectly. There is a garbage collector written inside spark which
periodically checks for weak references to RDDs/shuffle write/broadcast and
deletes them.



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/how-to-clean-shuffle-write-each-iteration-tp21886p21889.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



how to clean shuffle write each iteration

2015-03-02 Thread lisendong
I 'm using spark als.

I set the iteration number to 30.

And in each iteration, tasks will produce nearly 1TB shuffle write.

To my surprise, this shuffle data will not be cleaned until the total job
finished, which means, I need 30TB disk to store the shuffle data.


I think after each iteration, we can delete the shuffle data before current
iteration, right?

how to do this?



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/how-to-clean-shuffle-write-each-iteration-tp21886.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org