[
https://issues.apache.org/jira/browse/YARN-4459?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15068996#comment-15068996
]
Jun Gong commented on YARN-4459:
--------------------------------
Thanks [~Naganarasimha] for the info. Yes, it seems same problem. I think the
problem does not only exist for 'DelayedProcessKiller', it might occur for
every call to 'signal_container_as_user'.
> container-executor might kill process wrongly
> ---------------------------------------------
>
> Key: YARN-4459
> URL: https://issues.apache.org/jira/browse/YARN-4459
> Project: Hadoop YARN
> Issue Type: Bug
> Components: nodemanager
> Reporter: Jun Gong
> Assignee: Jun Gong
> Attachments: YARN-4459.01.patch, YARN-4459.02.patch
>
>
> When calling 'signal_container_as_user' in container-executor, it first
> checks whether process group exists, if not, it will kill the process
> itself(if it the process exists). It is not reasonable because that the
> process group does not exist means corresponding container has finished, if
> we kill the process itself, we just kill wrong process.
> We found it happened in our cluster many times. We used same account for
> starting NM and submitted app, and container-executor sometimes killed NM(the
> wrongly killed process might just be a newly started thread and was NM's
> child process).
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)