[
https://issues.apache.org/jira/browse/YARN-3535?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14627576#comment-14627576
]
Rohith Sharma K S commented on YARN-3535:
-----------------------------------------
bq. For preemption, container killed has two cases: container already pulled by
AM or not. For 1st case, AM should know container is killed, and AM will re-ask
container for task. For the case container not pull by AM, preemption killing
caused the same case of this issue. So I think it should not be recovered twice.
ahh, you are right. Basically if RMContainer is not pulled by AM, then its
state is ALLOCATED. On preempting RMContainer, resource request was recovered
twise i.e 1. This jira fix 2. Kill Container event in CS. So removing
*recoverResourceRequestForContainer(cont);* is make sense to me.
> ResourceRequest should be restored back to scheduler when RMContainer is
> killed at ALLOCATED
> ---------------------------------------------------------------------------------------------
>
> Key: YARN-3535
> URL: https://issues.apache.org/jira/browse/YARN-3535
> Project: Hadoop YARN
> Issue Type: Bug
> Affects Versions: 2.6.0
> Reporter: Peng Zhang
> Assignee: Peng Zhang
> Priority: Critical
> Attachments: 0003-YARN-3535.patch, YARN-3535-001.patch,
> YARN-3535-002.patch, syslog.tgz, yarn-app.log
>
>
> During rolling update of NM, AM start of container on NM failed.
> And then job hang there.
> Attach AM logs.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)