Eric Yang commented on YARN-9809:

[~ebadger] Thank you for the patch.  The patch looks very close to final 
product.  I have confirmed the test case failure doesn't happen, if there are 
sufficient amount of RAM on the testing node.  I also validated that new node 
manager can work with unpatched resource manager.  However, I could not get 
health check script to fail to cause node registered as unhealthy.

Here is my check script:
echo "i am here" > /tmp/hello
exit 1

It would be nice to have verbose message to show the exit code of the health 
check script in node manager log file.  The script is executed, but it shows 
healthy.  What am I doing wrong?

> NMs should supply a health status when registering with RM
> ----------------------------------------------------------
>                 Key: YARN-9809
>                 URL: https://issues.apache.org/jira/browse/YARN-9809
>             Project: Hadoop YARN
>          Issue Type: Bug
>            Reporter: Eric Badger
>            Assignee: Eric Badger
>            Priority: Major
>         Attachments: YARN-9809.001.patch, YARN-9809.002.patch, 
> YARN-9809.003.patch, YARN-9809.004.patch
> Currently if the NM registers with the RM and it is unhealthy, it can be 
> scheduled many containers before the first heartbeat. After the first 
> heartbeat, the RM will mark the NM as unhealthy and kill all of the 
> containers.

This message was sent by Atlassian Jira

To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

Reply via email to