[ 
https://issues.apache.org/jira/browse/YARN-2249?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14089909#comment-14089909
 ] 

Jian He commented on YARN-2249:
-------------------------------

Thanks Wangda for the review.
bq. we can cache outstanding container release request until x secs after 
restart reached. And could you elaborate why you use NM liveness expire time? 
I chose NM expire time for cache timeout because containers are forcibly killed 
after nm expired and we don't need to cache the release requests after that any 
more
bq. we only need cache release request for a period of time after AM 
reconnected to RM.
Right, changed to cache the release request only within the timeout.
bq. We should notify AM about container completed message when we decide to not 
recover a container.
good point, added.
bq. Can we wait for some state instead of Thread.sleep(3000);?
since the container 's gone, there's no state to wait. I think this is fine.

> RM may receive container release request on AM resync before container is 
> actually recovered
> --------------------------------------------------------------------------------------------
>
>                 Key: YARN-2249
>                 URL: https://issues.apache.org/jira/browse/YARN-2249
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>          Components: resourcemanager
>            Reporter: Jian He
>            Assignee: Jian He
>         Attachments: YARN-2249.1.patch, YARN-2249.1.patch, YARN-2249.2.patch
>
>
> AM resync on RM restart will send outstanding container release requests back 
> to the new RM. In the meantime, NMs report the container statuses back to RM 
> to recover the containers. If RM receives the container release request  
> before the container is actually recovered in scheduler, the container won't 
> be released and the release request will be lost.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to