[
https://issues.apache.org/jira/browse/YARN-4536?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
gu-chi resolved YARN-4536.
--------------------------
Resolution: Not A Problem
As analyzed further, this is introduced by some custom modification, sorry if
bother.
> DelayedProcessKiller may not work under heavy workload
> ------------------------------------------------------
>
> Key: YARN-4536
> URL: https://issues.apache.org/jira/browse/YARN-4536
> Project: Hadoop YARN
> Issue Type: Bug
> Components: nodemanager
> Affects Versions: 2.7.1
> Reporter: gu-chi
>
> I am now facing with orphan process of container. Here is the scenario:
> With heavy task load, the NM machine CPU usage can reach almost 100%. When
> some container got event of kill, it will get {{SIGTERM}} , and then the
> parent process exit, leave the container process to OS. This container
> process need handle some shutdown events or some logic, but hardly can get
> CPU, we suppose to see a {{SIGKILL}} as there is {{DelayedProcessKiller}}
> ,but the parent process which persisted as container pid no longer exist, so
> the kill command can not reach the container process. This is how orphan
> container process come.
> The orphan process do exit after some time, but the period can be very long,
> and will make the OS status worse. As I observed, the period can be several
> hours
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)