Github user tgravescs commented on the issue:
https://github.com/apache/spark/pull/17113
So I looked at this a little more. I'm more ok with this since Spark
doesn't actually invalidate the shuffle output. You are basically just trying
to stop new tasks from running on the executors already on that host. Its
either going to just blacklist those or kill them if you have that feature on.
Part of the reason we left it off to begin with was again we didn't want to
blacklist on the transient ones so we wanted to wait to see if it was truly an
issue in real life. if you do put this in I would like it configurable off
until we have more data as to if its really a problem users see.
Spark does immediately abort the stage but it doesn't kill the running
tasks, so if other tasks fetch failure before it can rerun the map task the
scheduler knows about them, but that is very timing dependent.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]