[ 
https://issues.apache.org/jira/browse/YARN-1365?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13998592#comment-13998592
 ] 

Tsuyoshi OZAWA commented on YARN-1365:
--------------------------------------

I've read your code. The prototype is including following changes:

1. Changed NodeManager's RegisterNodeManagerRequest to send ContainerReport.
2. Added Configuration about RM_WORK_PRESERVING_RECOVERY_ENABLED.
3. Added cluster timestamp to Container Id.

I think we should focus on NM should resync with the RM when the 
RM_WORK_PRESERVING_RECOVERY_ENABLED is set to true. Can you add resync 
code(ResourceManager's side code) into the patch? Also, in regard to 
ContainerId format, let's discuss on YARN-2052.


> ApplicationMasterService to allow Register and Unregister of an app that was 
> running before restart
> ---------------------------------------------------------------------------------------------------
>
>                 Key: YARN-1365
>                 URL: https://issues.apache.org/jira/browse/YARN-1365
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>          Components: resourcemanager
>            Reporter: Bikas Saha
>            Assignee: Anubhav Dhoot
>         Attachments: YARN-1365.initial.patch
>
>
> For an application that was running before restart, the 
> ApplicationMasterService currently throws an exception when the app tries to 
> make the initial register or final unregister call. These should succeed and 
> the RMApp state machine should transition to completed like normal. 
> Unregistration should succeed for an app that the RM considers complete since 
> the RM may have died after saving completion in the store but before 
> notifying the AM that the AM is free to exit.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to