[GitHub] spark issue #18213: [SPARK-20996][YARN] Better handling AM reattempt based o...

vanzin Fri, 16 Jun 2017 11:09:06 -0700

Github user vanzin commented on the issue:

    https://github.com/apache/spark/pull/18213
  
    I'm not so sure about this... this is a fundamental change in how the 
feature works. With this change, there are only two cases where the AM will be 
retried:
    
    - the client-mode AM
    - when YARN kills the driver (`EXIT_EARLY`)
    
    Every other error will disable re-attempts. That includes the user app 
errors you mention but it also includes infra issues (like an app failing 
because some temporary issue with HDFS causing tasks to fail, for example).
    
    I've seen the idea floating around about setting the max attempts to 1 by 
default, and I think that's a better solution (at least for cluster mode). That 
means users would have to opt-in to driver re-attempts, which mitigates a lot 
of the problems with blindly re-attempting to run the whole application.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] spark issue #18213: [SPARK-20996][YARN] Better handling AM reattempt based o...

Reply via email to