Re: Timeout errors from Akka in Spark 1.2.1

Tathagata Das Wed, 08 Apr 2015 18:11:43 -0700

Yes, in local mode they the driver and executor will be same the process.
And in that case the Java options in  SparkConf configuration will not
work.


On Wed, Apr 8, 2015 at 1:44 PM, N B <nb.nos...@gmail.com> wrote:

> Since we are running in local mode, won't all the executors be in the same
> JVM as the driver?
>
> Thanks
> NB
>
> On Wed, Apr 8, 2015 at 1:29 PM, Tathagata Das <t...@databricks.com> wrote:
>
>> Its does take effect on the executors, not on the driver. Which is okay
>> because executors have all the data and therefore have GC issues, not so
>> usually for the driver. If you want to double-sure, print the JVM flag
>> (e.g. http://stackoverflow.com/questions/10486375/print-all-jvm-flags)
>>
>> However, the GC i was referring to that initiates the RDD and shuffle
>> cleanup was the GC on the driver. Thought I would clarify.
>>
>> TD
>>
>> On Wed, Apr 8, 2015 at 1:23 PM, N B <nb.nos...@gmail.com> wrote:
>>
>>> Hi TD,
>>>
>>> Thanks for the response. Since you mentioned GC, this got me thinking.
>>>
>>> Given that we are running in local mode (all in a single JVM) for now,
>>> does the option "spark.executor.extraJavaOptions" set to
>>> "-XX:+UseConcMarkSweepGC" inside SparkConf object take effect at all before
>>> we use it to create the StreamingContext? I ask because that is what we are
>>> doing right now. If not, perhaps we have not been running with the
>>> Concurrent Mark Sweep at all and is that recommended instead of forcing GC
>>> periodically?
>>>
>>> Thanks
>>> NB
>>>
>>>
>>> On Wed, Apr 8, 2015 at 10:20 AM, Tathagata Das <t...@databricks.com>
>>> wrote:
>>>
>>>> There are a couple of options. Increase timeout (see Spark
>>>> configuration).
>>>>
>>>> Also see past mails in the mailing list.
>>>>
>>>> Another option you may try (I have gut feeling that may work, but I am
>>>> not sure) is calling GC on the driver periodically. The cleaning up of
>>>> stuff is tied to GCing of RDD objects and regular cleaning may help keep
>>>> things clean more rigorously rather than in unpredictable bursts of GC
>>>> activity.
>>>>
>>>> Let us know how it works out.
>>>>
>>>> TD
>>>>
>>>> On Tue, Apr 7, 2015 at 6:00 PM, Nikunj Bansal <nb.nos...@gmail.com>
>>>> wrote:
>>>>
>>>>> I have a standalone and local Spark streaming process where we are
>>>>> reading inputs using FlumeUtils. Our longest window size is 6 hours. After
>>>>> about a day and a half of running without any issues, we start seeing
>>>>> Timeout errors while cleaning up input blocks. This seems to cause reading
>>>>> from Flume to cease.
>>>>>
>>>>>
>>>>> ERROR sparkDriver-akka.actor.default-dispatcher-78
>>>>> BlockManagerSlaveActor.logError - Error in removing block
>>>>> input-0-1428182594000
>>>>> org.apache.spark.SparkException: Error sending message [message =
>>>>> UpdateBlockInfo(BlockManagerId(<driver>, localhost,
>>>>> 55067),input-0-1428182594000,StorageLevel(false, false, false, false,
>>>>> 1),0,0,0)]
>>>>> at org.apache.spark.util.AkkaUtils$.askWithReply(AkkaUtils.scala:201)
>>>>> at
>>>>> org.apache.spark.storage.BlockManagerMaster.askDriverWithReply(BlockManagerMaster.scala:221)
>>>>> at
>>>>> org.apache.spark.storage.BlockManagerMaster.updateBlockInfo(BlockManagerMaster.scala:62)
>>>>> at org.apache.spark.storage.BlockManager.org
>>>>> $apache$spark$storage$BlockManager$$tryToReportBlockStatus(BlockManager.scala:385)
>>>>> at
>>>>> org.apache.spark.storage.BlockManager.reportBlockStatus(BlockManager.scala:361)
>>>>> at
>>>>> org.apache.spark.storage.BlockManager.removeBlock(BlockManager.scala:1105)
>>>>> at
>>>>> org.apache.spark.storage.BlockManagerSlaveActor$$anonfun$receiveWithLogging$1$$anonfun$applyOrElse$1.apply$mcZ$sp(BlockManagerSlaveActor.scala:44)
>>>>> at
>>>>> org.apache.spark.storage.BlockManagerSlaveActor$$anonfun$receiveWithLogging$1$$anonfun$applyOrElse$1.apply(BlockManagerSlaveActor.scala:43)
>>>>> at
>>>>> org.apache.spark.storage.BlockManagerSlaveActor$$anonfun$receiveWithLogging$1$$anonfun$applyOrElse$1.apply(BlockManagerSlaveActor.scala:43)
>>>>> at
>>>>> org.apache.spark.storage.BlockManagerSlaveActor$$anonfun$1.apply(BlockManagerSlaveActor.scala:76)
>>>>> at
>>>>> scala.concurrent.impl.Future$PromiseCompletingRunnable.liftedTree1$1(Future.scala:24)
>>>>> at
>>>>> scala.concurrent.impl.Future$PromiseCompletingRunnable.run(Future.scala:24)
>>>>> at akka.dispatch.TaskInvocation.run(AbstractDispatcher.scala:41)
>>>>> at
>>>>> akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:393)
>>>>> at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
>>>>> at
>>>>> scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
>>>>> at
>>>>> scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
>>>>> at
>>>>> scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
>>>>> Caused by: java.util.concurrent.TimeoutException: Futures timed out
>>>>> after [30 seconds]
>>>>> at
>>>>> scala.concurrent.impl.Promise$DefaultPromise.ready(Promise.scala:219)
>>>>> at
>>>>> scala.concurrent.impl.Promise$DefaultPromise.result(Promise.scala:223)
>>>>> at scala.concurrent.Await$$anonfun$result$1.apply(package.scala:107)
>>>>> at
>>>>> akka.dispatch.MonitorableThreadFactory$AkkaForkJoinWorkerThread$$anon$3.block(ThreadPoolBuilder.scala:169)
>>>>> at
>>>>> scala.concurrent.forkjoin.ForkJoinPool.managedBlock(ForkJoinPool.java:3640)
>>>>> at
>>>>> akka.dispatch.MonitorableThreadFactory$AkkaForkJoinWorkerThread.blockOn(ThreadPoolBuilder.scala:167)
>>>>> at scala.concurrent.Await$.result(package.scala:107)
>>>>> at org.apache.spark.util.AkkaUtils$.askWithReply(AkkaUtils.scala:187)
>>>>> ... 17 more
>>>>>
>>>>> There was a similar query posted here
>>>>> http://apache-spark-user-list.1001560.n3.nabble.com/Block-removal-causes-Akka-timeouts-td15632.html
>>>>> but did not find any resolution to that issue.
>>>>>
>>>>>
>>>>> Thanks in advance,
>>>>> NB
>>>>>
>>>>>
>>>>
>>>
>>
>

Re: Timeout errors from Akka in Spark 1.2.1

Reply via email to