[ 
https://issues.apache.org/jira/browse/YARN-3668?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14549218#comment-14549218
 ] 

Steve Loughran commented on YARN-3668:
--------------------------------------

[~sandflee] : I know you are using something else, I was just describing what 
we do to deal with failures. 

If it is purely AM failure you care about, then setting the restart bit at 
launch time is enough for YARN to bring things back. If the AM fails too many 
times in the failure window then the app will fail, for which there is one fix: 
don't fail as often.

I'd actually like a failure code to tell YARN to restart us without counting it 
as a failure; this would help us do live updates more safely.

> Long run service shouldn't be killed even if Yarn crashed
> ---------------------------------------------------------
>
>                 Key: YARN-3668
>                 URL: https://issues.apache.org/jira/browse/YARN-3668
>             Project: Hadoop YARN
>          Issue Type: Wish
>            Reporter: sandflee
>
> For long running service, it shouldn't be killed even if all yarn component 
> crashed, with RM work preserving and NM restart, yarn could take over 
> applications again.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to