Tsuyoshi OZAWA commented on YARN-1879:

Talked with Jian offline. 

In this case, token is expired for the application after finishing AM's 
container and I think we don't need to handle it.

I'd like to confirm whether finishApplicationMaster() can be issued after AM 
containers exit. There are no such case, but finishApplicationMaster() can be 
issued after RM's removing AM's entry in a following case:

1. RM1 saves the app in RMStateStore and then crashes.
2. FinishApplicationMasterResponse#isRegistered still return false.
3. The AM still retries the 2nd RM.

Thanks very much for clarifying, Jian. Attached a updated patch which includes 
a test for retried finishApplicationMaster and a test for retried 
registerApplicationMaster before and after RM-restart.

> Mark Idempotent/AtMostOnce annotations to ApplicationMasterProtocol for RM 
> fail over
> ------------------------------------------------------------------------------------
>                 Key: YARN-1879
>                 URL: https://issues.apache.org/jira/browse/YARN-1879
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>          Components: resourcemanager
>            Reporter: Jian He
>            Assignee: Tsuyoshi OZAWA
>            Priority: Critical
>         Attachments: YARN-1879.1.patch, YARN-1879.1.patch, 
> YARN-1879.11.patch, YARN-1879.12.patch, YARN-1879.13.patch, 
> YARN-1879.14.patch, YARN-1879.15.patch, YARN-1879.16.patch, 
> YARN-1879.17.patch, YARN-1879.18.patch, YARN-1879.19.patch, 
> YARN-1879.2-wip.patch, YARN-1879.2.patch, YARN-1879.20.patch, 
> YARN-1879.21.patch, YARN-1879.22.patch, YARN-1879.23.patch, 
> YARN-1879.23.patch, YARN-1879.24.patch, YARN-1879.25.patch, 
> YARN-1879.3.patch, YARN-1879.4.patch, YARN-1879.5.patch, YARN-1879.6.patch, 
> YARN-1879.7.patch, YARN-1879.8.patch, YARN-1879.9.patch

This message was sent by Atlassian JIRA

Reply via email to