Github user squito commented on the issue:
https://github.com/apache/spark/pull/15249
I agree with Kay's summary above, just one addition. For (2) Temporary
Resource Contention & the approach in this PR -- perhaps its obvious, but
another consequence of this approach is that you lose resources for computing
tasks, even if task-locality was never a consideration. One of your executors
is temporarily in trouble, so it fails a bunch of tasks, and then gets
blacklisted from the entire taskset. 10 seconds later, its back to an OK
state, but even if your taskset takes hours, you'd never take advantage of that
other executor.
I think the situation w/ (1) & this PR is fine.
Also I realized this was discussed in the design doc some under the [Flaky
Apps
Section](https://docs.google.com/document/d/1R2CVKctUZG9xwD67jkRdhBR4sCgccPR2dhTYSRXFEmg/edit#heading=h.3yb336nr3vy1)
(not that it adds much more than what we have discussed here).
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]