Github user tgravescs commented on the issue:

    https://github.com/apache/spark/pull/17113
  
    So I looked at this a little more.  I'm more ok with this since Spark 
doesn't actually invalidate the shuffle output. You are basically just trying 
to stop new tasks from running on the executors already on that host. Its 
either going to just blacklist those or kill them if you have that feature on.
    
    Part of the reason we left it off to begin with was again we didn't want to 
blacklist on the transient ones so we wanted to wait to see if it was truly an 
issue in real life.  if you do put this in I would like it configurable off 
until we have more data as to if its really a problem users see.
    
    Spark does immediately abort the stage but it doesn't kill the running 
tasks, so if other tasks fetch failure before it can rerun the map task the 
scheduler knows about them, but that is very timing dependent.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to