Github user tgravescs commented on the issue:

    https://github.com/apache/spark/pull/17445
  
    there is a large discussion about how to handle fetch failures going on in 
https://issues.apache.org/jira/browse/SPARK-20178.  The fact that you got a 
fetch failure does not mean that all blocks are invalid or that the external 
shuffle service is totally down.  It could very well be an intermittent thing 
as well.  There was also a pr to make the stage attempts configurable so you 
could increase that.
    
    If a lot of people are seeing this issue the question is do we need to do 
something shorter term to handle this well we are discussing SPARK-20178. 
Certainly if we are seeing more actual job failures due to it, it would be 
better to invalidate all the output and it possibly runs longer but at least it 
doesn't fail.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to