[
https://issues.apache.org/jira/browse/YARN-4051?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15901553#comment-15901553
]
Jason Lowe commented on YARN-4051:
----------------------------------
Thanks for updating the patch! In the future, please don't delete patches and
re-upload them with the same name. It can lead to very confusing cases where
Jenkins comments on a patch that happens to have the same name as one of the
current attachments but isn't actually the patch that was tested.
The following code won't actually cause it to ignore the FINISH_APPS event.
The {{continue}} in the for loop is degenerate, so all this does is log
warnings but otherwise is semantically the same logic:
{code}
for (Container container : app.getContainers().values()) {
if (container.isRecovering()) {
LOG.warn("drop FINISH_APPS event to " + appID + "because container "
+ container.getContainerId() + "is recovering");
continue;
}
}
{code}
Also this shouldn't be a warning since it's not actually wrong when this
happens, correct? Similarly the warn log when ignoring the FINISH_CONTAINERS
event seems like that should just be an info log at best.
I'm also wondering about the scenario where the kill event is coming in from an
AM and not the RM. If a container is still in the recovering state when we
open up the client service for new requests it seems a client (e.g.: AM) could
come in and ask for a still-recovering container to be killed. I think the
container process will be orphaned if that occurs, since the NM will mistakenly
believe the container has not been launched yet.
> ContainerKillEvent is lost when container is In New State and is recovering
> ----------------------------------------------------------------------------
>
> Key: YARN-4051
> URL: https://issues.apache.org/jira/browse/YARN-4051
> Project: Hadoop YARN
> Issue Type: Bug
> Components: nodemanager
> Reporter: sandflee
> Assignee: sandflee
> Priority: Critical
> Attachments: YARN-4051.01.patch, YARN-4051.02.patch,
> YARN-4051.03.patch, YARN-4051.04.patch, YARN-4051.05.patch, YARN-4051.06.patch
>
>
> As in YARN-4050, NM event dispatcher is blocked, and container is in New
> state, when we finish application, the container still alive even after NM
> event dispatcher is unblocked.
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]