Tsuyoshi OZAWA commented on YARN-1367:

I've read your code. The prototype is including following changes:
1. Changed NodeManager's RegisterNodeManagerRequest to send ContainerReport.
3. Added cluster timestamp to Container Id.
I think we should focus on NM should resync with the RM when the 
RM_WORK_PRESERVING_RECOVERY_ENABLED is set to true. Can you add resync 
code(ResourceManager's side code) into the patch? Also, in regard to 
ContainerId format, let's discuss on YARN-2052.

> After restart NM should resync with the RM without killing containers
> ---------------------------------------------------------------------
>                 Key: YARN-1367
>                 URL: https://issues.apache.org/jira/browse/YARN-1367
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>          Components: resourcemanager
>            Reporter: Bikas Saha
>            Assignee: Anubhav Dhoot
>         Attachments: YARN-1367.prototype.patch
> After RM restart, the RM sends a resync response to NMs that heartbeat to it. 
>  Upon receiving the resync response, the NM kills all containers and 
> re-registers with the RM. The NM should be changed to not kill the container 
> and instead inform the RM about all currently running containers including 
> their allocations etc. After the re-register, the NM should send all pending 
> container completions to the RM as usual.

This message was sent by Atlassian JIRA

Reply via email to