Github user mccheah commented on a diff in the pull request:

    https://github.com/apache/spark/pull/2828#discussion_r18986702
  
    --- Diff: core/src/main/scala/org/apache/spark/deploy/worker/Worker.scala 
---
    @@ -362,9 +372,19 @@ private[spark] class Worker(
         }
       }
     
    -  def masterDisconnected() {
    +  private def masterDisconnected() {
         logError("Connection to master failed! Waiting for master to 
reconnect...")
         connected = false
    +    scheduleAttemptsToReconnectToMaster()
    +  }
    +
    +  private def scheduleAttemptsToReconnectToMaster() {
    --- End diff --
    
    I'm okay with leaving it retrying indefinitely. The user may not notice the 
error until much later, and then reboot the master. If the workers decide to 
stop trying, the user will need to bounce the workers as well.
    
    I agree with having the logic being very similar is a bit of a pain, but 
these are really two different scenarios, so I could foresee such 
nearly-duplicated logic being justified in either direction.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to