[ https://issues.apache.org/jira/browse/YARN-3668?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14549218#comment-14549218 ]
Steve Loughran commented on YARN-3668: -------------------------------------- [~sandflee] : I know you are using something else, I was just describing what we do to deal with failures. If it is purely AM failure you care about, then setting the restart bit at launch time is enough for YARN to bring things back. If the AM fails too many times in the failure window then the app will fail, for which there is one fix: don't fail as often. I'd actually like a failure code to tell YARN to restart us without counting it as a failure; this would help us do live updates more safely. > Long run service shouldn't be killed even if Yarn crashed > --------------------------------------------------------- > > Key: YARN-3668 > URL: https://issues.apache.org/jira/browse/YARN-3668 > Project: Hadoop YARN > Issue Type: Wish > Reporter: sandflee > > For long running service, it shouldn't be killed even if all yarn component > crashed, with RM work preserving and NM restart, yarn could take over > applications again. -- This message was sent by Atlassian JIRA (v6.3.4#6332)