Naganarasimha G R commented on YARN-4459:

Hi [~hex108],
thanks for working on this jira. I am not from c back ground,  neverthless 
checked the API of kill and few doubts i have here
IIUC existing code checks whether container process has created any sub process 
then kill all the process, else if its a single process then i presume 
{{kill(-pid,0)}} will return {{-1}} then it tries to kill only the container 
process id only. Can you confirm this by testing?
I just tested this with unix command {{kill}} what i could understand was 
{{kill -0 -- -<pid which has children>}} will be successfull and {{$?}} will 
return *0* but when i run {{kill -0 -- -<pid which has NO children>}} then 
{{bash: kill: (-10967) - No such process}} will thrown.
Correct me if my understanding is wrong.
cc/ @[~vvasudev].

> container-executor might kill process wrongly
> ---------------------------------------------
>                 Key: YARN-4459
>                 URL: https://issues.apache.org/jira/browse/YARN-4459
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: nodemanager
>            Reporter: Jun Gong
>            Assignee: Jun Gong
>         Attachments: YARN-4459.01.patch, YARN-4459.02.patch
> When calling 'signal_container_as_user' in container-executor, it first 
> checks whether process group exists, if not, it will kill the process 
> itself(if it the process exists).  It is not reasonable because that the 
> process group does not exist means corresponding container has finished, if 
> we kill the process itself, we just kill wrong process.
> We found it happened in our cluster many times. We used same account for 
> starting NM and submitted app, and container-executor sometimes killed NM(the 
> wrongly killed process might just be a newly started thread and was NM's 
> child process).

This message was sent by Atlassian JIRA

Reply via email to