[ 
https://issues.apache.org/jira/browse/YARN-4051?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14941233#comment-14941233
 ] 

Jason Lowe commented on YARN-4051:
----------------------------------

Thanks for the patch!  Sorry for the delay, as I missed this when it was 
originally filed.

I'm lukewarm on an event buffering approach since we have to track it and 
remember to propagate it at all the appropriate times which is a maintenance 
burden.  Would it be simpler if we simply prevented the kill request from 
coming in too soon?  Seems like another way to fix this would be to prevent 
kill requests from arriving before we're done recovering containers.  We could 
do a similar "try again" response as we do for container start requests while 
still recovering, and we can postpone finish application processing until after 
containers are recovered.

However we decide to fix this, there should be a unit test to cover the 
scenario.

> ContainerKillEvent is lost when container is  In New State and is recovering
> ----------------------------------------------------------------------------
>
>                 Key: YARN-4051
>                 URL: https://issues.apache.org/jira/browse/YARN-4051
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: nodemanager
>            Reporter: sandflee
>            Assignee: sandflee
>            Priority: Critical
>         Attachments: YARN-4051.01.patch, YARN-4051.02.patch, 
> YARN-4051.03.patch
>
>
> As in YARN-4050, NM event dispatcher is blocked, and container is in New 
> state, when we finish application, the container still alive even after NM 
> event dispatcher is unblocked.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to