Github user jerryshao commented on the pull request:
https://github.com/apache/spark/pull/12078#issuecomment-204329728
I see your point, just defer the registering of executors until fully
created. But here Spark do take care of this issue by such code, though as I
said not so elegant to handle this race condition.
```scala
if (executor == null) {
logError("Received LaunchTask command but executor was null")
System.exit(1)
} else {
```
Looking at the description of this JIRA, a more deeper problem is that
driver scheduler is not aware of this bad machine and repeatedly assign tasks
on this node, and finally make the job failure. So in short term maybe this pr
can solve this race condition problem, but this race condition will only be
happened on some slow machines (that's why I haven't met this problem before),
so maybe a more generic solution is that scheduler should be aware of bad
executor/node. Just my two cents, not so relevant to this PR.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]