Github user tgravescs commented on the issue:

    https://github.com/apache/spark/pull/17113
  
    > Another thing I thought about as I was reviewing this -- spark currently 
assumes that a fetchfailure is always the fault of the source, never the 
destination. I almost wonder if we should count it against both, with some 
sensible heuristic for looking at a collection of failures and deciding who is 
at fault.
    
    I think this makes sense to try to tell the difference and improve our 
logic there.
    
    > Perhaps we should think of a better way to choose the right behavior.
    
    Does this mean you don't want to go with this approach?  I'm actually not 
sure this is a huge change.   Its a decent change in behavior but for the cases 
where nodes really do go down this could help a lot.  I think Spark definitely 
doesn't handle this case well now.
    
    Sorry again I haven't done a full review been trying to think the entire 
fetch failure scenarios through and have just been busy with other things.
    
    One downside to adding the config BLACKLIST_FETCH_FAILURE_ENABLED limits us 
on possibly changing this functionality to say only blacklist on multiple fetch 
failures.  We could track fetch failures across stage attempts and say only 
after it fails X number of tasks which could be across stage attempts do we 
blacklist it.  I guess its marked as experimental so its a bit more ok for us 
to change it.   Perhaps X isn't just failed tasks but failed tasks from a % of 
different hosts.  You could potentially use that also to determine if the 
source is bad or the destination.  If many across hosts have failed then you 
expect the source is bad, if its just one destination then perhaps that is bad. 
 Still thinking this through.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to