> On April 14, 2017, 2:26 p.m., Santhosh Kumar Shanmugham wrote:
> > src/main/python/apache/aurora/executor/common/health_checker.py
> > Lines 163-166 (patched)
> > <https://reviews.apache.org/r/58462/diff/1/?file=1692816#file1692816line163>
> >
> >     This will cause a task to get stuck in `STARTING` since `self.running` 
> > will never be set to `True`.
> >     
> >     Can you explain the particular usecase here? Also add a test case to 
> > exercise this branch.
> 
> Vladimir Khalatyan wrote:
>     The idea is to make HealthCheck process to start after some of the setup 
> processes are finished. With the current approach it's possible to addjust 
> the "starting" point of the HealthCheck process by changing 
> initial_interval_secs. But it means that we rely on the timing which doesn't 
> guarantee anything.  
>     The idea of HealthCheck "snoozing" is ignore any status of the 
> healthcheck unless some process tells HealthCheck to start checking the 
> health of the service.
>     
>     Example (simplified one):
>      Let's assume we start two processes on the machine: the LB registration 
> and the UWSGI process. Let's say the uwsgi process requires some time to warm 
> up. The LB registration depends on the load on LB, how soon uwsgi warms up, 
> etc. So the actual moment when the application becomes available can vary 
> from couple of seconds to minutes and we can not rely on 
> initial_interval_secs. So we create a .healthchecksnooze file and ignore all 
> results of the healthcheck unless this file is there. In a meanwhile the LB 
> registration process will try to register service some number of times ( < 
> max_failures) and delete the .healthchecksnooze after it succeeds. Since this 
> particular moment the healthcheck will start incrementing the concecutive 
> successes or failures and we can determine whether the deployment is 
> successfull or not. 
>      So with this approach we can specify the "starting" point of health 
> checking more accurately and dependent on other processes. 
>      
>      Here by "starting" point of the health check I mean the checking of the 
> application health and changing the consecutive successes or failures, not 
> the actual system process.

> "So the actual moment when the application becomes available can vary from 
> couple of seconds to minutes and we can not rely on initial_interval_secs."

The current implementation addresses this problem of `initial_interval_secs` 
not responding faster with varying startup times. It achieves this by 
performing `health checks` during the startup time (`initial_interval_secs`) 
but ignores all failures during this period, however successful health checks 
now count towards transitioning the task to a healthy (RUNNING) state. Thereby 
it can accomodate both slow startup as well as fast startup without making the 
faster startup instances from waiting until the entire `initial_interval_secs` 
has expired.

However for your change in particular, you might also need to account for 
`_should_enforce_deadline` - which will treat a task as unhealthy if it runs 
out of attempts.


- Santhosh Kumar


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/58462/#review172032
-----------------------------------------------------------


On April 14, 2017, 1:35 p.m., Vladimir Khalatyan wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/58462/
> -----------------------------------------------------------
> 
> (Updated April 14, 2017, 1:35 p.m.)
> 
> 
> Review request for Aurora, Joshua Cohen and Zameer Manji.
> 
> 
> Repository: aurora
> 
> 
> Description
> -------
> 
> Fix bug. Do not increase current_consecutive_successes if .healthchecksnooze 
> present
> 
> 
> Diffs
> -----
> 
>   src/main/python/apache/aurora/executor/common/health_checker.py 
> e9e4129af2db5202a82e9f6d54109a00bbae97ce 
> 
> 
> Diff: https://reviews.apache.org/r/58462/diff/1/
> 
> 
> Testing
> -------
> 
> The Health Check is succeeding when the .healthchecksnooze is present. But it 
> should just snooze which means there shouldn't be any increase in consecutive 
> successes or consecutive failures.
> 
> 
> Thanks,
> 
> Vladimir Khalatyan
> 
>

Reply via email to