Github user mccheah commented on the pull request:

    https://github.com/apache/spark/pull/4481#issuecomment-73664422
  
    Messages being lost because of an unreliable network can make the driver's 
messages hey lost and thus the driver retires even when the master is alive. 
Also we have a decent number of Spark contexts being created and stopped 
frequently, which can make the master's akka  message queue become a bottleneck.
    
    We have had a few cases when the driver gave up but the master launched 
executors anyways, meaning if the driver had just waited a little longer, it 
could have proceeded with the job.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to