I think this broadcast cleaning(memory block remove?) timeout exception was caused by:
15/02/02 11:48:49 ERROR TaskSchedulerImpl: Lost executor 13 on small18-tap1.common.lip6.fr: remote Akka client disassociated 15/02/02 11:48:49 ERROR SparkDeploySchedulerBackend: Asked to remove non-existent executor 13 15/02/02 11:48:49 ERROR SparkDeploySchedulerBackend: Asked to remove non-existent executor 13 Anyone has points on this? Best, Yifan LI > On 02 Feb 2015, at 11:47, Yifan LI <[email protected]> wrote: > > Thanks, Sonal. > > But it seems to be an error happened when “cleaning broadcast”? > > BTW, what is the timeout of “[30 seconds]”? can I increase it? > > > > Best, > Yifan LI > > > > > >> On 02 Feb 2015, at 11:12, Sonal Goyal <[email protected] >> <mailto:[email protected]>> wrote: >> >> That may be the cause of your issue. Take a look at the tuning guide[1] and >> maybe also profile your application. See if you can reuse your objects. >> >> 1. http://spark.apache.org/docs/latest/tuning.html >> <http://spark.apache.org/docs/latest/tuning.html> >> >> >> Best Regards, >> Sonal >> Founder, Nube Technologies <http://www.nubetech.co/> >> >> <http://in.linkedin.com/in/sonalgoyal> >> >> >> >> On Sat, Jan 31, 2015 at 4:21 AM, Yifan LI <[email protected] >> <mailto:[email protected]>> wrote: >> Yes, I think so, esp. for a pregel application… have any suggestion? >> >> Best, >> Yifan LI >> >> >> >> >> >>> On 30 Jan 2015, at 22:25, Sonal Goyal <[email protected] >>> <mailto:[email protected]>> wrote: >>> >>> Is your code hitting frequent garbage collection? >>> >>> Best Regards, >>> Sonal >>> Founder, Nube Technologies <http://www.nubetech.co/> >>> >>> <http://in.linkedin.com/in/sonalgoyal> >>> >>> >>> >>> On Fri, Jan 30, 2015 at 7:52 PM, Yifan LI <[email protected] >>> <mailto:[email protected]>> wrote: >>> >>>> >>>> >>>> Hi, >>>> >>>> I am running my graphx application on Spark 1.2.0(11 nodes cluster), has >>>> requested 30GB memory per node and 100 cores for around 1GB input >>>> dataset(5 million vertices graph). >>>> >>>> But the error below always happen… >>>> >>>> Is there anyone could give me some points? >>>> >>>> (BTW, the overall edge/vertex RDDs will reach more than 100GB during graph >>>> computation, and another version of my application can work well on the >>>> same dataset while it need much less memory during computation) >>>> >>>> Thanks in advance!!! >>>> >>>> >>>> 15/01/29 18:05:08 ERROR ContextCleaner: Error cleaning broadcast 60 >>>> java.util.concurrent.TimeoutException: Futures timed out after [30 seconds] >>>> at scala.concurrent.impl.Promise$DefaultPromise.ready(Promise.scala:219) >>>> at >>>> scala.concurrent.impl.Promise$DefaultPromise.result(Promise.scala:223) >>>> at scala.concurrent.Await$$anonfun$result$1.apply(package.scala:107) >>>> at >>>> scala.concurrent.BlockContext$DefaultBlockContext$.blockOn(BlockContext.scala:53) >>>> at scala.concurrent.Await$.result(package.scala:107) >>>> at >>>> org.apache.spark.storage.BlockManagerMaster.removeBroadcast(BlockManagerMaster.scala:137) >>>> at >>>> org.apache.spark.broadcast.TorrentBroadcast$.unpersist(TorrentBroadcast.scala:227) >>>> at >>>> org.apache.spark.broadcast.TorrentBroadcastFactory.unbroadcast(TorrentBroadcastFactory.scala:45) >>>> at >>>> org.apache.spark.broadcast.BroadcastManager.unbroadcast(BroadcastManager.scala:66) >>>> at >>>> org.apache.spark.ContextCleaner.doCleanupBroadcast(ContextCleaner.scala:185) >>>> at >>>> org.apache.spark.ContextCleaner$$anonfun$org$apache$spark$ContextCleaner$$keepCleaning$1$$anonfun$apply$mcV$sp$2.apply(ContextCleaner.scala:147) >>>> at >>>> org.apache.spark.ContextCleaner$$anonfun$org$apache$spark$ContextCleaner$$keepCleaning$1$$anonfun$apply$mcV$sp$2.apply(ContextCleaner.scala:138) >>>> at scala.Option.foreach(Option.scala:236) >>>> at >>>> org.apache.spark.ContextCleaner$$anonfun$org$apache$spark$ContextCleaner$$keepCleaning$1.apply$mcV$sp(ContextCleaner.scala:138) >>>> at >>>> org.apache.spark.ContextCleaner$$anonfun$org$apache$spark$ContextCleaner$$keepCleaning$1.apply(ContextCleaner.scala:134) >>>> at >>>> org.apache.spark.ContextCleaner$$anonfun$org$apache$spark$ContextCleaner$$keepCleaning$1.apply(ContextCleaner.scala:134) >>>> at org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1460) >>>> at org.apache.spark.ContextCleaner.org >>>> <http://org.apache.spark.contextcleaner.org/>$apache$spark$ContextCleaner$$keepCleaning(ContextCleaner.scala:133) >>>> at org.apache.spark.ContextCleaner$$anon$3.run(ContextCleaner.scala:65) >>>> [Stage 91:===================> (2 + >>>> 4) / 6]15/01/29 18:08:15 ERROR SparkDeploySchedulerBackend: Asked to >>>> remove non-existent executor 0 >>>> [Stage 93:================================> (29 + 20) >>>> / 49]15/01/29 23:47:03 ERROR TaskSchedulerImpl: Lost executor 9 on >>>> small11-tap1.common.lip6.fr <http://small11-tap1.common.lip6.fr/>: remote >>>> Akka client disassociated >>>> [Stage 83:> (1 + 0) / 6][Stage 86:> (0 + 1) / 2][Stage 88:> (0 + 2) >>>> / 8]15/01/29 23:47:06 ERROR SparkDeploySchedulerBackend: Asked to remove >>>> non-existent executor 9 >>>> [Stage 83:===============> (5 + 1) / 6][Stage 88:=============> (9 + 2) >>>> / 11]15/01/29 23:57:30 ERROR TaskSchedulerImpl: Lost executor 8 on >>>> small10-tap1.common.lip6.fr <http://small10-tap1.common.lip6.fr/>: remote >>>> Akka client disassociated >>>> 15/01/29 23:57:30 ERROR SparkDeploySchedulerBackend: Asked to remove >>>> non-existent executor 8 >>>> 15/01/29 23:57:30 ERROR SparkDeploySchedulerBackend: Asked to remove >>>> non-existent executor 8 >>>> >>>> Best, >>>> Yifan LI >>>> >>>> >>>> >>>> >>>> >>> >>> >> >> >
