Rohith Sharma K S commented on YARN-3535:

bq. For preemption, container killed has two cases: container already pulled by 
AM or not. For 1st case, AM should know container is killed, and AM will re-ask 
container for task. For the case container not pull by AM, preemption killing 
caused the same case of this issue. So I think it should not be recovered twice.
ahh, you are right. Basically if RMContainer is not pulled by AM, then its 
state is ALLOCATED. On preempting RMContainer, resource request was recovered 
twise i.e 1. This jira fix 2. Kill Container event in CS. So removing 
*recoverResourceRequestForContainer(cont);* is make sense to me.

>  ResourceRequest should be restored back to scheduler when RMContainer is 
> killed at ALLOCATED
> ---------------------------------------------------------------------------------------------
>                 Key: YARN-3535
>                 URL: https://issues.apache.org/jira/browse/YARN-3535
>             Project: Hadoop YARN
>          Issue Type: Bug
>    Affects Versions: 2.6.0
>            Reporter: Peng Zhang
>            Assignee: Peng Zhang
>            Priority: Critical
>         Attachments: 0003-YARN-3535.patch, YARN-3535-001.patch, 
> YARN-3535-002.patch, syslog.tgz, yarn-app.log
> During rolling update of NM, AM start of container on NM failed. 
> And then job hang there.
> Attach AM logs.

This message was sent by Atlassian JIRA

Reply via email to