Github user jerryshao commented on the pull request:

    https://github.com/apache/spark/pull/12078#issuecomment-204329728
  
    I see your point, just defer the registering of executors until fully 
created. But here Spark do take care of this issue by such code, though as I 
said not so elegant to handle this race condition.
    
    ```scala
          if (executor == null) {
            logError("Received LaunchTask command but executor was null")
            System.exit(1)
          } else {
    ```
    Looking at the description of this JIRA, a more deeper problem is that 
driver scheduler is not aware of this bad machine and repeatedly assign tasks 
on this node, and finally make the job failure. So in short term maybe this pr 
can solve this race condition problem, but this race condition will only be 
happened on some slow machines (that's why I haven't met this problem before), 
so maybe a more generic solution is that scheduler should be aware of bad 
executor/node. Just my two cents, not so relevant to this PR.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to