[
https://issues.apache.org/jira/browse/YARN-4770?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15184195#comment-15184195
]
Jun Gong commented on YARN-4770:
--------------------------------
Thanks [~vinodkv] for reporting the issue. The patch in YARN-3998 should have
handled this case.
{quote}
The relaunch feature needs to work across NM restarts, so we should save the
retry-context and policy per container into the state-store and reload it for
continue relaunching after NM restart.
{quote}
As [~vvasudev] said, "The container retry policy details are already stored in
the state-store as part of the ContainerLaunchContext", so we do not need care
it.
{quote}
We should also handle restarting of any containers that may have crashed during
the NM reboot.
{quote}
If container crashed during the NM reboot, container would transit to
RELAUNCHING state. I will check it again.
> Auto-restart of containers should work across NM restarts.
> ----------------------------------------------------------
>
> Key: YARN-4770
> URL: https://issues.apache.org/jira/browse/YARN-4770
> Project: Hadoop YARN
> Issue Type: Sub-task
> Reporter: Vinod Kumar Vavilapalli
> Assignee: Vinod Kumar Vavilapalli
>
> See my comment
> [here|https://issues.apache.org/jira/browse/YARN-3998?focusedCommentId=15133367&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15133367]
> on YARN-3998. Need to take care of two things:
> - The relaunch feature needs to work across NM restarts, so we should save
> the retry-context and policy per container into the state-store and reload it
> for continue relaunching after NM restart.
> - We should also handle restarting of any containers that may have crashed
> during the NM reboot.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)