Github user vanzin commented on a diff in the pull request:

    https://github.com/apache/spark/pull/9992#discussion_r46183794
  
    --- Diff: 
yarn/src/main/scala/org/apache/spark/deploy/yarn/ApplicationMaster.scala ---
    @@ -228,6 +228,7 @@ private[spark] class ApplicationMaster(
       }
     
       private def sparkContextStopped(sc: SparkContext) = {
    +    sc.requestTotalExecutors(0, 0, Map.empty)
    --- End diff --
    
    So, this doesn't really fix the issue, just makes it less likely. 
`sc.requestTotalExecutors` is asynchronous, so the allocator may still run 
before the message actually arrives in `AMEndpoint.receiveAndReply`.
    
    That being said, I'm not sure the race is completely fixable. But you could 
get closer by putting the code currently in the `RequestExecutors` message 
handler in `AMEndpoint` in a separate method, and calling it from here.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to