Github user squito commented on the issue:

    https://github.com/apache/spark/pull/15249
  
    I agree with Kay's summary above, just one addition.  For (2) Temporary 
Resource Contention & the approach in this PR -- perhaps its obvious, but 
another consequence of this approach is that you lose resources for computing 
tasks, even if task-locality was never a consideration.  One of your executors 
is temporarily in trouble, so it fails a bunch of tasks, and then gets 
blacklisted from the entire taskset.  10 seconds later, its back to an OK 
state, but even if your taskset takes hours, you'd never take advantage of that 
other executor.
    
    I think the situation w/ (1) & this PR is fine.
    
    Also I realized this was discussed in the design doc some under the [Flaky 
Apps 
Section](https://docs.google.com/document/d/1R2CVKctUZG9xwD67jkRdhBR4sCgccPR2dhTYSRXFEmg/edit#heading=h.3yb336nr3vy1)
 (not that it adds much more than what we have discussed here).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to