[ 
https://issues.apache.org/jira/browse/YARN-4051?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15907544#comment-15907544
 ] 

sandflee commented on YARN-4051:
--------------------------------

Thanks [~jlowe],  
bq. I'm also wondering about the scenario where the kill event is coming in 
from an AM and not the RM. 
simple throw a YarnException when AM stops a recovering container, but seems 
NMClientAsyncImpl could't try stopContainer again, we could fix this in a new 
issue? 
{code}
            .addTransition(ContainerState.RUNNING,
                EnumSet.of(ContainerState.DONE, ContainerState.FAILED),
                ContainerEventType.STOP_CONTAINER,
                new StopContainerTransition())
{code}
do another two changes:
1, using app.handle(new ApplicationContainerInitEvent(container)) when recover 
containers, for there is a race condition when Finish events comes, 
ApplicationContainerInitEvent not processed and containers are not added to app
2, use ConcurrentHashMap to store containers in app. because I encountered 
ConcurrentModifyException when iterating app.getContainers() , and I also see 
web and AppLogAggregator using app.getContainers() without protect.

> ContainerKillEvent is lost when container is  In New State and is recovering
> ----------------------------------------------------------------------------
>
>                 Key: YARN-4051
>                 URL: https://issues.apache.org/jira/browse/YARN-4051
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: nodemanager
>            Reporter: sandflee
>            Assignee: sandflee
>            Priority: Critical
>         Attachments: YARN-4051.01.patch, YARN-4051.02.patch, 
> YARN-4051.03.patch, YARN-4051.04.patch, YARN-4051.05.patch, 
> YARN-4051.06.patch, YARN-4051.07.patch
>
>
> As in YARN-4050, NM event dispatcher is blocked, and container is in New 
> state, when we finish application, the container still alive even after NM 
> event dispatcher is unblocked.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

Reply via email to