Jian He commented on YARN-1367:

Thanks for working on the patch. 
The patch needs update, can you update please ? 
A few initial comments:
- Let's leave containerId handled  in YARN-2052 separately.
- The extra ContainerReport in RegisterNodeManagerRequest is not needed any 
- NM side may not need the config of work-preserving restart enabled. Given RM 
has this config already, RM should be able to instruct NM to 
keep_containers_on_resync in the case of work-preserving restart and 
kill_containers_on_resync in the case of non-work-preserving restart.  We also 
avoid config overhead on each NM if doing this. 

> After restart NM should resync with the RM without killing containers
> ---------------------------------------------------------------------
>                 Key: YARN-1367
>                 URL: https://issues.apache.org/jira/browse/YARN-1367
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>          Components: resourcemanager
>            Reporter: Bikas Saha
>            Assignee: Anubhav Dhoot
>         Attachments: YARN-1367.prototype.patch
> After RM restart, the RM sends a resync response to NMs that heartbeat to it. 
>  Upon receiving the resync response, the NM kills all containers and 
> re-registers with the RM. The NM should be changed to not kill the container 
> and instead inform the RM about all currently running containers including 
> their allocations etc. After the re-register, the NM should send all pending 
> container completions to the RM as usual.

This message was sent by Atlassian JIRA

Reply via email to