[ 
https://issues.apache.org/jira/browse/YARN-4046?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14682293#comment-14682293
 ] 

Anubhav Dhoot commented on YARN-4046:
-------------------------------------

As per GNU linux 
[documentation|http://www.gnu.org/software/coreutils/manual/html_node/kill-invocation.html#kill-invocation]
 "--" may not be needed, but looks like all distros (Debian) do not support  
not having "--".
{noformat} If a negative pid argument is desired as the first one, it should be 
preceded by --. However, as a common extension to POSIX, -- is not required 
with ‘kill -signal -pid’. {noformat}
So a fix is to prefix "--" always to match the recommendation.

> Applications fail on NM restart on some linux distro because NM container 
> recovery declares AM container as LOST
> ----------------------------------------------------------------------------------------------------------------
>
>                 Key: YARN-4046
>                 URL: https://issues.apache.org/jira/browse/YARN-4046
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: nodemanager
>            Reporter: Anubhav Dhoot
>            Assignee: Anubhav Dhoot
>            Priority: Critical
>
> On a debian machine we have seen node manager recovery of containers fail 
> because the signal syntax for process group may not work. We see errors in 
> checking if process is alive during container recovery which causes the 
> container to be declared as LOST (154) on a NodeManager restart.
> The application will fail with error
> {noformat}
> Application application_1439244348718_0001 failed 1 times due to Attempt 
> recovered after RM restartAM Container for 
> appattempt_1439244348718_0001_000001 exited with exitCode: 154
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to