Re: RDDs being cleaned too fast
RDD.persist() can be useful here. On 11 December 2014 at 14:34, ankits [via Apache Spark User List] < ml-node+s1001560n20613...@n3.nabble.com> wrote: > > I'm using spark 1.1.0 and am seeing persisted RDDs being cleaned up too > fast. How can i inspect the size of RDD in memory and get more information > about why it was cleaned up. There should be more than enough memory > available on the cluster to store them, and by default, the > spark.cleaner.ttl is infinite, so I want more information about why this is > happening and how to prevent it. > > Spark just logs this when removing RDDs: > > [2014-12-11 01:19:34,006] INFO spark.storage.BlockManager [] [] - > Removing RDD 33 > [2014-12-11 01:19:34,010] INFO pache.spark.ContextCleaner [] > [akka://JobServer/user/context-supervisor/job-context1] - Cleaned RDD 33 > [2014-12-11 01:19:34,012] INFO spark.storage.BlockManager [] [] - > Removing RDD 33 > [2014-12-11 01:19:34,016] INFO pache.spark.ContextCleaner [] > [akka://JobServer/user/context-supervisor/job-context1] - Cleaned RDD 33 > > -- > If you reply to this email, your message will be added to the discussion > below: > > http://apache-spark-user-list.1001560.n3.nabble.com/RDDs-being-cleaned-too-fast-tp20613.html > To start a new topic under Apache Spark User List, email > ml-node+s1001560n1...@n3.nabble.com > To unsubscribe from Apache Spark User List, click here > <http://apache-spark-user-list.1001560.n3.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_code&node=1&code=aG5haGFrQHd5bnlhcmRncm91cC5jb218MXwtMTgxOTE5MTkyOQ==> > . > NAML > <http://apache-spark-user-list.1001560.n3.nabble.com/template/NamlServlet.jtp?macro=macro_viewer&id=instant_html%21nabble%3Aemail.naml&base=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespace&breadcrumbs=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml> > -- Regards, Harihar Nahak BigData Developer Wynyard Email:hna...@wynyardgroup.com | Extn: 8019 - --Harihar -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/RDDs-being-cleaned-too-fast-tp20613p20738.html Sent from the Apache Spark User List mailing list archive at Nabble.com.
Re: RDDs being cleaned too fast
I was having similar issues with my persistent RDDs. After some digging around, I noticed that the partitions were not balanced evenly across the available nodes. After a "repartition", the RDD was spread evenly across all available memory. Not sure if that is something that would help your use-case though. You could also increase the spark.storage.memoryFraction if that is an option. - Ranga On Wed, Dec 10, 2014 at 10:23 PM, Aaron Davidson wrote: > The ContextCleaner uncaches RDDs that have gone out of scope on the > driver. So it's possible that the given RDD is no longer reachable in your > program's control flow, or else it'd be a bug in the ContextCleaner. > > On Wed, Dec 10, 2014 at 5:34 PM, ankits wrote: > >> I'm using spark 1.1.0 and am seeing persisted RDDs being cleaned up too >> fast. >> How can i inspect the size of RDD in memory and get more information about >> why it was cleaned up. There should be more than enough memory available >> on >> the cluster to store them, and by default, the spark.cleaner.ttl is >> infinite, so I want more information about why this is happening and how >> to >> prevent it. >> >> Spark just logs this when removing RDDs: >> >> [2014-12-11 01:19:34,006] INFO spark.storage.BlockManager [] [] - >> Removing >> RDD 33 >> [2014-12-11 01:19:34,010] INFO pache.spark.ContextCleaner [] >> [akka://JobServer/user/context-supervisor/job-context1] - Cleaned RDD 33 >> [2014-12-11 01:19:34,012] INFO spark.storage.BlockManager [] [] - >> Removing >> RDD 33 >> [2014-12-11 01:19:34,016] INFO pache.spark.ContextCleaner [] >> [akka://JobServer/user/context-supervisor/job-context1] - Cleaned RDD 33 >> >> >> >> -- >> View this message in context: >> http://apache-spark-user-list.1001560.n3.nabble.com/RDDs-being-cleaned-too-fast-tp20613.html >> Sent from the Apache Spark User List mailing list archive at Nabble.com. >> >> - >> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org >> For additional commands, e-mail: user-h...@spark.apache.org >> >> >
Re: RDDs being cleaned too fast
The ContextCleaner uncaches RDDs that have gone out of scope on the driver. So it's possible that the given RDD is no longer reachable in your program's control flow, or else it'd be a bug in the ContextCleaner. On Wed, Dec 10, 2014 at 5:34 PM, ankits wrote: > I'm using spark 1.1.0 and am seeing persisted RDDs being cleaned up too > fast. > How can i inspect the size of RDD in memory and get more information about > why it was cleaned up. There should be more than enough memory available on > the cluster to store them, and by default, the spark.cleaner.ttl is > infinite, so I want more information about why this is happening and how to > prevent it. > > Spark just logs this when removing RDDs: > > [2014-12-11 01:19:34,006] INFO spark.storage.BlockManager [] [] - Removing > RDD 33 > [2014-12-11 01:19:34,010] INFO pache.spark.ContextCleaner [] > [akka://JobServer/user/context-supervisor/job-context1] - Cleaned RDD 33 > [2014-12-11 01:19:34,012] INFO spark.storage.BlockManager [] [] - Removing > RDD 33 > [2014-12-11 01:19:34,016] INFO pache.spark.ContextCleaner [] > [akka://JobServer/user/context-supervisor/job-context1] - Cleaned RDD 33 > > > > -- > View this message in context: > http://apache-spark-user-list.1001560.n3.nabble.com/RDDs-being-cleaned-too-fast-tp20613.html > Sent from the Apache Spark User List mailing list archive at Nabble.com. > > - > To unsubscribe, e-mail: user-unsubscr...@spark.apache.org > For additional commands, e-mail: user-h...@spark.apache.org > >
RDDs being cleaned too fast
I'm using spark 1.1.0 and am seeing persisted RDDs being cleaned up too fast. How can i inspect the size of RDD in memory and get more information about why it was cleaned up. There should be more than enough memory available on the cluster to store them, and by default, the spark.cleaner.ttl is infinite, so I want more information about why this is happening and how to prevent it. Spark just logs this when removing RDDs: [2014-12-11 01:19:34,006] INFO spark.storage.BlockManager [] [] - Removing RDD 33 [2014-12-11 01:19:34,010] INFO pache.spark.ContextCleaner [] [akka://JobServer/user/context-supervisor/job-context1] - Cleaned RDD 33 [2014-12-11 01:19:34,012] INFO spark.storage.BlockManager [] [] - Removing RDD 33 [2014-12-11 01:19:34,016] INFO pache.spark.ContextCleaner [] [akka://JobServer/user/context-supervisor/job-context1] - Cleaned RDD 33 -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/RDDs-being-cleaned-too-fast-tp20613.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org