Github user mwws commented on the pull request:

    https://github.com/apache/spark/pull/8760#issuecomment-160544588
  
    @kayousterhout For example, task can failed because some host can't access 
source/network, local storage of some host is full, broken class linkage and 
other run-time exception. In such case, I think the worker will keep alive and 
cluster managers can't help too much. So that farther tasks will still submit 
to the problematic executors/nodes. Original code base already contain similar 
blacklist mechanism `executorIsBlacklisted`, but blacklist is maintained in 
every TaskSetManager (called `failedExecutors`) and not shared with each other, 
which means if a new TaskSet comes, it cannot benefit from previous experience 
of other TaskSet.
    
    This change adds complexity, but from user perspective, he can just use 
build-in `SingleNodeStrategy` which behave the same as original logic, or use 
build-in `ExecutorAndNodeStrategy` by only changing configuration. At the same 
time, it provides flexibility for advance user to customize it. 



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to