[jira] [Commented] (YARN-8122) Component health threshold monitor

Eric Yang (JIRA) Thu, 19 Apr 2018 15:06:31 -0700

    [ 
https://issues.apache.org/jira/browse/YARN-8122?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16444876#comment-16444876
 ]


Eric Yang commented on YARN-8122:
---------------------------------

[~gsaha] Thank you for the patch.  I try to simulate the cluster with bad 
docker daemon on one of the node manager. 
 I see that containers are getting relaunched and the relaunching at a steady 
rate.  When the calculation happens, it doesn't take into account of how many 
container has failed and retried during the container-health-threshold.window.  
The calculation is only base on number of current running containers.  Hence, 
service is reporting healthy instead of unhealthy.  I think the more accurate 
calculation would be health-threshold.percent = completed + running container / 
 total launched container with in health-threshold.window.  Another simplified 
calculation is total failed container / total launched in 
container-health.threshold.window should be less than 1 - 
health-threshold.percent.

Nginx relies on supervisor to start the processes.  It will not work without 
ENTRY_POINT support.  I can not get the example to work.  Therefore, I think it 
would be safer to use centos/httpd-24-centos7 with launch command: 
/usr/bin/run-httpd in the example.

> Component health threshold monitor
> ----------------------------------
>
>                 Key: YARN-8122
>                 URL: https://issues.apache.org/jira/browse/YARN-8122
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>            Reporter: Gour Saha
>            Assignee: Gour Saha
>            Priority: Major
>         Attachments: YARN-8122.001.patch, YARN-8122.002.patch, 
> YARN-8122.003.patch, YARN-8122.004.patch, YARN-8122.draft.patch
>
>
> Slider supported component health threshold monitoring with SLIDER-1246. It 
> would be good to have this feature for YARN Service too.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (YARN-8122) Component health threshold monitor

Reply via email to