[
https://issues.apache.org/jira/browse/YARN-8473?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16531690#comment-16531690
]
Sunil Govindan commented on YARN-8473:
--------------------------------------
Thanks [~jlowe] for analyzing this and sharing patch. I have one doubt in the
patch.
In the default case, now a ContainerKillEvent is raised mentioning app is not
running and hence killing container. In which case, container can come to this
case? I think a common error handling is much safer here to avoid having some
orphaned containers however could we add also some error logs which prints
containerid, states etc to help to debug such cases more.
> Containers being launched as app tears down can leave containers in NEW state
> -----------------------------------------------------------------------------
>
> Key: YARN-8473
> URL: https://issues.apache.org/jira/browse/YARN-8473
> Project: Hadoop YARN
> Issue Type: Bug
> Components: nodemanager
> Affects Versions: 2.8.4
> Reporter: Jason Lowe
> Assignee: Jason Lowe
> Priority: Major
> Attachments: YARN-8473.001.patch, YARN-8473.002.patch
>
>
> I saw a case where containers were stuck on a nodemanager in the NEW state
> because they tried to launch just as an application was tearing down. The
> container sent an INIT_CONTAINER event to the ApplicationImpl which then
> executed an invalid transition since that event is not handled/expected when
> the application is in the process of tearing down.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]