Github user BryanCutler commented on the pull request:
https://github.com/apache/spark/pull/6205#issuecomment-115108550
Hey @squito and @zsxwing , I just came across a weird problem... After
running spark shell, I noticed this exception being thrown
```
15/06/24 21:37:01 ERROR Utils: Uncaught exception in thread
heartbeat-receiver-event-loop-thread
org.apache.spark.SparkException: Error sending message [message =
(BlockManagerHeartbeat(BlockManagerId(driver, localhost, 51109)),600 seconds)]
at
org.apache.spark.rpc.RpcEndpointRef.askWithRetry(RpcEndpointRef.scala:118)
at
org.apache.spark.rpc.RpcEndpointRef.askWithRetry(RpcEndpointRef.scala:77)
at
org.apache.spark.scheduler.DAGScheduler.executorHeartbeatReceived(DAGScheduler.scala:188)
at
org.apache.spark.scheduler.TaskSchedulerImpl.executorHeartbeatReceived(TaskSchedulerImpl.scala:371)
at
org.apache.spark.HeartbeatReceiver$$anonfun$receiveAndReply$1$$anon$2$$anonfun$run$2.apply$mcV$sp(HeartbeatReceiver.scala:107)
at org.apache.spark.util.Utils$.tryLogNonFatalError(Utils.scala:1264)
at
org.apache.spark.HeartbeatReceiver$$anonfun$receiveAndReply$1$$anon$2.run(HeartbeatReceiver.scala:106)
at
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:178)
at
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:292)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
Caused by: org.apache.spark.SparkException: Unmatched message
(BlockManagerHeartbeat(BlockManagerId(driver, localhost, 51109)),600 seconds)
from Actor[akka://sparkDriver/temp/$To]
```
I traced it down to `BlockManagerHeartbeat` message
[here](https://github.com/BryanCutler/spark/blob/configTimeout-6980/core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala#L189)
I couldn't figure out why the message was not matched, and how it was even
compiling it.. but actually it looks like the message type is being made into a
tuple of the original message and the `FiniteDuration` - nice, haha! I checked
and didn't see this happening anywhere else, and once I fixed this up the
problem went away. So my questions are
* Is this something that we need to prevent from accidentally happening
again?
* To fix this line, can we put the 600 seconds in a conf property? The
other option for a `RpcTimeout` would look like this
```scala
BlockManagerHeartbeat(blockManagerId), new RpcTimeout(600 seconds,
"BlockManagerHeartbeat"))
```
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]