[ 
https://issues.apache.org/jira/browse/YARN-4051?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15901553#comment-15901553
 ] 

Jason Lowe commented on YARN-4051:
----------------------------------

Thanks for updating the patch!  In the future, please don't delete patches and 
re-upload them with the same name.  It can lead to very confusing cases where 
Jenkins comments on a patch that happens to have the same name as one of the 
current attachments but isn't actually the patch that was tested.

The following code won't actually cause it to ignore the FINISH_APPS event.  
The {{continue}} in the for loop is degenerate, so all this does is log 
warnings but otherwise is semantically the same logic:
{code}
        for (Container container : app.getContainers().values()) {
          if (container.isRecovering()) {
            LOG.warn("drop FINISH_APPS event to " + appID + "because container "
                + container.getContainerId() + "is recovering");
            continue;
          }
        }
{code}

Also this shouldn't be a warning since it's not actually wrong when this 
happens, correct?  Similarly the warn log when ignoring the FINISH_CONTAINERS 
event seems like that should just be an info log at best.

I'm also wondering about the scenario where the kill event is coming in from an 
AM and not the RM.  If a container is still in the recovering state when we 
open up the client service for new requests it seems a client (e.g.: AM) could 
come in and ask for a still-recovering container to be killed.  I think the 
container process will be orphaned if that occurs, since the NM will mistakenly 
believe the container has not been launched yet.

> ContainerKillEvent is lost when container is  In New State and is recovering
> ----------------------------------------------------------------------------
>
>                 Key: YARN-4051
>                 URL: https://issues.apache.org/jira/browse/YARN-4051
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: nodemanager
>            Reporter: sandflee
>            Assignee: sandflee
>            Priority: Critical
>         Attachments: YARN-4051.01.patch, YARN-4051.02.patch, 
> YARN-4051.03.patch, YARN-4051.04.patch, YARN-4051.05.patch, YARN-4051.06.patch
>
>
> As in YARN-4050, NM event dispatcher is blocked, and container is in New 
> state, when we finish application, the container still alive even after NM 
> event dispatcher is unblocked.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

Reply via email to