Github user mridulm commented on the issue:

    https://github.com/apache/spark/pull/15249
  
    @squito I am hoping we _can_ remove the old code/functionality actually (it 
is klunky very specific to single executor resource contention/shutdown usecase 
- unfortunately common enough to warrant its introduction), and subsume it with 
a better design/impl - perhaps as part of your work (in this and other pr's).
    
    @kayousterhout I believe my concern with (2) is that the blacklist is 
(currently) permanent for task/taskset on an executor/node. For jobs running on 
larger number of executors, this will perhaps not be too much of an issue 
(other than a degradation in performance); but as the executor/node count 
decreases, we increase probability of job failures even if the transient 
failures are recoverable.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to