[ 
https://issues.apache.org/jira/browse/YARN-1367?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13998631#comment-13998631
 ] 

Tsuyoshi OZAWA commented on YARN-1367:
--------------------------------------

Some comments against a patch:

1. Can you fix the indent?
{code}
+  public boolean isWorkPreservingRestartEnabled() { return
+      isWorkPreservingRestartEnabled;
+  }
{code}

{code}
+          if (!rmWorkPreservingRestartEnbaled)
+          {
+            containerManager.cleanupContainersOnNMResync();
+          }
{code}

2. IMO, "recovery.work-preserving-restart.enabled" is more appropriate because 
this is one of options under RECOVERY_ENABLED namespace. 
{code}
  public static final String RM_WORK_PRESERVING_RECOVERY_ENABLED = RM_PREFIX
      + "work-preserving.recovery.enabled";
{code}


> After restart NM should resync with the RM without killing containers
> ---------------------------------------------------------------------
>
>                 Key: YARN-1367
>                 URL: https://issues.apache.org/jira/browse/YARN-1367
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>          Components: resourcemanager
>            Reporter: Bikas Saha
>            Assignee: Anubhav Dhoot
>         Attachments: YARN-1367.prototype.patch
>
>
> After RM restart, the RM sends a resync response to NMs that heartbeat to it. 
>  Upon receiving the resync response, the NM kills all containers and 
> re-registers with the RM. The NM should be changed to not kill the container 
> and instead inform the RM about all currently running containers including 
> their allocations etc. After the re-register, the NM should send all pending 
> container completions to the RM as usual.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to