[ https://issues.apache.org/jira/browse/YARN-4051?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15901553#comment-15901553 ]
Jason Lowe commented on YARN-4051: ---------------------------------- Thanks for updating the patch! In the future, please don't delete patches and re-upload them with the same name. It can lead to very confusing cases where Jenkins comments on a patch that happens to have the same name as one of the current attachments but isn't actually the patch that was tested. The following code won't actually cause it to ignore the FINISH_APPS event. The {{continue}} in the for loop is degenerate, so all this does is log warnings but otherwise is semantically the same logic: {code} for (Container container : app.getContainers().values()) { if (container.isRecovering()) { LOG.warn("drop FINISH_APPS event to " + appID + "because container " + container.getContainerId() + "is recovering"); continue; } } {code} Also this shouldn't be a warning since it's not actually wrong when this happens, correct? Similarly the warn log when ignoring the FINISH_CONTAINERS event seems like that should just be an info log at best. I'm also wondering about the scenario where the kill event is coming in from an AM and not the RM. If a container is still in the recovering state when we open up the client service for new requests it seems a client (e.g.: AM) could come in and ask for a still-recovering container to be killed. I think the container process will be orphaned if that occurs, since the NM will mistakenly believe the container has not been launched yet. > ContainerKillEvent is lost when container is In New State and is recovering > ---------------------------------------------------------------------------- > > Key: YARN-4051 > URL: https://issues.apache.org/jira/browse/YARN-4051 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager > Reporter: sandflee > Assignee: sandflee > Priority: Critical > Attachments: YARN-4051.01.patch, YARN-4051.02.patch, > YARN-4051.03.patch, YARN-4051.04.patch, YARN-4051.05.patch, YARN-4051.06.patch > > > As in YARN-4050, NM event dispatcher is blocked, and container is in New > state, when we finish application, the container still alive even after NM > event dispatcher is unblocked. -- This message was sent by Atlassian JIRA (v6.3.15#6346) --------------------------------------------------------------------- To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org