Github user BryanCutler commented on the pull request:

    https://github.com/apache/spark/pull/6205#issuecomment-115108550
  
    Hey @squito and @zsxwing , I just came across a weird problem... After 
running spark shell, I noticed this exception being thrown
    ```
    15/06/24 21:37:01 ERROR Utils: Uncaught exception in thread 
heartbeat-receiver-event-loop-thread
    org.apache.spark.SparkException: Error sending message [message = 
(BlockManagerHeartbeat(BlockManagerId(driver, localhost, 51109)),600 seconds)]
        at 
org.apache.spark.rpc.RpcEndpointRef.askWithRetry(RpcEndpointRef.scala:118)
        at 
org.apache.spark.rpc.RpcEndpointRef.askWithRetry(RpcEndpointRef.scala:77)
        at 
org.apache.spark.scheduler.DAGScheduler.executorHeartbeatReceived(DAGScheduler.scala:188)
        at 
org.apache.spark.scheduler.TaskSchedulerImpl.executorHeartbeatReceived(TaskSchedulerImpl.scala:371)
        at 
org.apache.spark.HeartbeatReceiver$$anonfun$receiveAndReply$1$$anon$2$$anonfun$run$2.apply$mcV$sp(HeartbeatReceiver.scala:107)
        at org.apache.spark.util.Utils$.tryLogNonFatalError(Utils.scala:1264)
        at 
org.apache.spark.HeartbeatReceiver$$anonfun$receiveAndReply$1$$anon$2.run(HeartbeatReceiver.scala:106)
        at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
        at java.util.concurrent.FutureTask.run(FutureTask.java:262)
        at 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:178)
        at 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:292)
        at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
        at java.lang.Thread.run(Thread.java:745)
    Caused by: org.apache.spark.SparkException: Unmatched message 
(BlockManagerHeartbeat(BlockManagerId(driver, localhost, 51109)),600 seconds) 
from Actor[akka://sparkDriver/temp/$To]
    ```
    I traced it down to `BlockManagerHeartbeat` message 
[here](https://github.com/BryanCutler/spark/blob/configTimeout-6980/core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala#L189)
    I couldn't figure out why the message was not matched, and how it was even 
compiling it.. but actually it looks like the message type is being made into a 
tuple of the original message and the `FiniteDuration` - nice, haha!  I checked 
and didn't see this happening anywhere else, and once I fixed this up the 
problem went away.  So my questions are
    
    * Is this something that we need to prevent from accidentally happening 
again?
    * To fix this line, can we put the 600 seconds in a conf property?  The 
other option for a `RpcTimeout` would look like this
    ```scala
    BlockManagerHeartbeat(blockManagerId), new RpcTimeout(600 seconds, 
"BlockManagerHeartbeat"))
    ```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to