[
https://issues.apache.org/jira/browse/YARN-8122?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16444876#comment-16444876
]
Eric Yang commented on YARN-8122:
---------------------------------
[~gsaha] Thank you for the patch. I try to simulate the cluster with bad
docker daemon on one of the node manager.
I see that containers are getting relaunched and the relaunching at a steady
rate. When the calculation happens, it doesn't take into account of how many
container has failed and retried during the container-health-threshold.window.
The calculation is only base on number of current running containers. Hence,
service is reporting healthy instead of unhealthy. I think the more accurate
calculation would be health-threshold.percent = completed + running container /
total launched container with in health-threshold.window. Another simplified
calculation is total failed container / total launched in
container-health.threshold.window should be less than 1 -
health-threshold.percent.
Nginx relies on supervisor to start the processes. It will not work without
ENTRY_POINT support. I can not get the example to work. Therefore, I think it
would be safer to use centos/httpd-24-centos7 with launch command:
/usr/bin/run-httpd in the example.
> Component health threshold monitor
> ----------------------------------
>
> Key: YARN-8122
> URL: https://issues.apache.org/jira/browse/YARN-8122
> Project: Hadoop YARN
> Issue Type: Sub-task
> Reporter: Gour Saha
> Assignee: Gour Saha
> Priority: Major
> Attachments: YARN-8122.001.patch, YARN-8122.002.patch,
> YARN-8122.003.patch, YARN-8122.004.patch, YARN-8122.draft.patch
>
>
> Slider supported component health threshold monitoring with SLIDER-1246. It
> would be good to have this feature for YARN Service too.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]