Jason Lowe commented on YARN-3535:

Yes, the resource request needs to be added back.  That's by far the simplest 
fix.  The AM has no idea the request was fulfilled before it was killed, so 
from the AM's perspective the request is still outstanding.

I'm +1 for adding a new flag indicating whether the NM reconnect is 
container-preserving or not, as long as we work through the upgrade scenarios 
to verify we don't introduce regressions.

>  ResourceRequest should be restored back to scheduler when RMContainer is 
> killed at ALLOCATED
> ---------------------------------------------------------------------------------------------
>                 Key: YARN-3535
>                 URL: https://issues.apache.org/jira/browse/YARN-3535
>             Project: Hadoop YARN
>          Issue Type: Bug
>    Affects Versions: 2.6.0
>            Reporter: Peng Zhang
>            Assignee: Peng Zhang
>         Attachments: syslog.tgz, yarn-app.log
> During rolling update of NM, AM start of container on NM failed. 
> And then job hang there.
> Attach AM logs.

This message was sent by Atlassian JIRA

Reply via email to