Github user tgravescs commented on the issue:
https://github.com/apache/spark/pull/17088
fyi, this is somewhat related to https://github.com/apache/spark/pull/17113
I mention it because I think both depend on how we handle failures and
retries. This and that together could cause bad things to occur. As I
mentioned on that pr we specifically had issues with these things on TEZ. So
I agree with others that we need to be careful here. Removing all could cause
a lot more work then needed. personally with the defaults that spark has for
shuffle retries I think a fetch failure can easily be a transient issue
(rolling upgrade, temporarily overloaded NM, etc).
I need to refresh my memory on all the interactions and I'll get back.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]