[ 
https://issues.apache.org/jira/browse/YARN-3678?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14560663#comment-14560663
 ] 

Hong Zhiguo commented on YARN-3678:
-----------------------------------

First, "stop container" happens frequently.
Second, the pid recycle doesn't need to have a whole round in 250ms.  Only need 
to have one or more rounds during the container lifetime.

If we have 100 times of "stop container" happen on one node per day, we have 
100/32768, about 0.3% chance for one node one day. That's not very low, 
especially when we have 5000 nodes.


> DelayedProcessKiller may kill other process other than container
> ----------------------------------------------------------------
>
>                 Key: YARN-3678
>                 URL: https://issues.apache.org/jira/browse/YARN-3678
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: nodemanager
>    Affects Versions: 2.6.0
>            Reporter: gu-chi
>            Priority: Critical
>
> Suppose one container finished, then it will do clean up, the PID file still 
> exist and will trigger once singalContainer, this will kill the process with 
> the pid in PID file, but as container already finished, so this PID may be 
> occupied by other process, this may cause serious issue.
> As I know, my NM was killed unexpectedly, what I described can be the cause. 
> Even rarely occur.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to