Rohith commented on YARN-3535:

Thanks [~peng.zhang] for working on this issue..  
Some comments
# I think the method {{recoverResourceRequestForContainer}} should be 
synchronized, any thought?
# Why do we require {{RMContextImpl.java}} changes? I think this we can avoid, 
not necessarily required.

Tests : 
# Any specific reason for chaning {{TestAMRestart.java}}?
# IIUC, this issue can occur in all the scheduler given AM-RM heart beat is 
lesser than NM-RM heart beat interval. So can it include FT test case that 
applicable for both CS and FS. May it you can add test in the extending class 
{{ParameterizedSchedulerTestBase}} i.e TestAbstractYarnScheduler.

>  ResourceRequest should be restored back to scheduler when RMContainer is 
> killed at ALLOCATED
> ---------------------------------------------------------------------------------------------
>                 Key: YARN-3535
>                 URL: https://issues.apache.org/jira/browse/YARN-3535
>             Project: Hadoop YARN
>          Issue Type: Bug
>    Affects Versions: 2.6.0
>            Reporter: Peng Zhang
>            Assignee: Peng Zhang
>              Labels: BB2015-05-TBR
>         Attachments: YARN-3535-001.patch, YARN-3535-002.patch, syslog.tgz, 
> yarn-app.log
> During rolling update of NM, AM start of container on NM failed. 
> And then job hang there.
> Attach AM logs.

This message was sent by Atlassian JIRA

Reply via email to