[ 
https://issues.apache.org/jira/browse/YARN-3535?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14629208#comment-14629208
 ] 

Peng Zhang commented on YARN-3535:
----------------------------------

Thanks [~rohithsharma] for updating patch. 
patch LGTM.

bq. One point to be clear that , here the assumption made is if RMContainer is 
ALLOCATED then only recover ResourceRequest. If RMcontainer is in RUNNING, then 
completed container will go to AM in allocate response and AM will ask new 
ResourceRequest.

During running in our scale cluster with FS and preemption enabled, MapReduce 
app works good with this assumption.
Basically, I think this assumption make sense for other type app.

>  ResourceRequest should be restored back to scheduler when RMContainer is 
> killed at ALLOCATED
> ---------------------------------------------------------------------------------------------
>
>                 Key: YARN-3535
>                 URL: https://issues.apache.org/jira/browse/YARN-3535
>             Project: Hadoop YARN
>          Issue Type: Bug
>    Affects Versions: 2.6.0
>            Reporter: Peng Zhang
>            Assignee: Peng Zhang
>            Priority: Critical
>         Attachments: 0003-YARN-3535.patch, 0004-YARN-3535.patch, 
> 0005-YARN-3535.patch, YARN-3535-001.patch, YARN-3535-002.patch, syslog.tgz, 
> yarn-app.log
>
>
> During rolling update of NM, AM start of container on NM failed. 
> And then job hang there.
> Attach AM logs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to