[ 
https://issues.apache.org/jira/browse/YARN-4770?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15184195#comment-15184195
 ] 

Jun Gong commented on YARN-4770:
--------------------------------

Thanks [~vinodkv] for reporting the issue. The patch in YARN-3998 should have 
handled this case.

{quote}
The relaunch feature needs to work across NM restarts, so we should save the 
retry-context and policy per container into the state-store and reload it for 
continue relaunching after NM restart.
{quote}
As [~vvasudev] said, "The container retry policy details are already stored in 
the state-store as part of the ContainerLaunchContext", so we do not need care 
it.

{quote}
We should also handle restarting of any containers that may have crashed during 
the NM reboot.
{quote}
If container crashed during the NM reboot, container would transit to 
RELAUNCHING state. I will check it again. 

> Auto-restart of containers should work across NM restarts.
> ----------------------------------------------------------
>
>                 Key: YARN-4770
>                 URL: https://issues.apache.org/jira/browse/YARN-4770
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>            Reporter: Vinod Kumar Vavilapalli
>            Assignee: Vinod Kumar Vavilapalli
>
> See my comment 
> [here|https://issues.apache.org/jira/browse/YARN-3998?focusedCommentId=15133367&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15133367]
>  on YARN-3998. Need to take care of two things:
>  - The relaunch feature needs to work across NM restarts, so we should save 
> the retry-context and policy per container into the state-store and reload it 
> for continue relaunching after NM restart.
>  - We should also handle restarting of any containers that may have crashed 
> during the NM reboot.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to