Github user sryza commented on a diff in the pull request:

    https://github.com/apache/spark/pull/5311#discussion_r27591507
  
    --- Diff: 
core/src/main/scala/org/apache/spark/ExecutorAllocationManager.scala ---
    @@ -173,32 +179,24 @@ private[spark] class ExecutorAllocationManager(
       }
     
       /**
    -   * Register for scheduler callbacks to decide when to add and remove 
executors.
    +   * Register for scheduler callbacks to decide when to add and remove 
executors, and start
    +   * the scheduling task.
        */
       def start(): Unit = {
         listenerBus.addListener(listener)
    -    startPolling()
    +
    +    val scheduleTask = new Runnable() {
    +      override def run(): Unit = Utils.logUncaughtExceptions(schedule())
    +    }
    +    executor.scheduleAtFixedRate(scheduleTask, 0, intervalMillis, 
TimeUnit.MILLISECONDS)
       }
     
       /**
    -   * Start the main polling thread that keeps track of when to add and 
remove executors.
    +   * Stop the allocation manager.
        */
    -  private def startPolling(): Unit = {
    -    val t = new Thread {
    -      override def run(): Unit = {
    -        while (true) {
    -          try {
    -            schedule()
    -          } catch {
    -            case e: Exception => logError("Exception in dynamic executor 
allocation thread!", e)
    -          }
    -          Thread.sleep(intervalMillis)
    -        }
    -      }
    -    }
    -    t.setName("spark-dynamic-executor-allocation")
    -    t.setDaemon(true)
    -    t.start()
    +  def stop(): Unit = {
    +    executor.shutdown()
    +    executor.awaitTermination(10, TimeUnit.SECONDS)
    --- End diff --
    
    Thinking aloud here:
    We stop the ExecutorAllocationManager after stopping the DAGScheduler, 
which means, in yarn-client mode, we'll have already torn down the YARN 
application.  If `schedule` is called after that, we could be trying to make an 
RPC to the AM, which is no longer there.  So it seems like waiting the full 10 
seconds for that RPC to time out could be a common case.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to