-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/31104/#review72980
-----------------------------------------------------------


For this change to be useful, we also have to think about the meaning of 
`initial_interval_secs`. In its curent form, health checks only start when the 
initial delay has passed. Commonly this delay has to be set very high in order 
to guarantee that a task will come up even in a worst case scenario (e.g., 
server where I pull my binary from is slow today). With your change however, no 
task would be considered running until this worst case time window has passed.

A potential solution would be to change the meaning of `initial_interval_secs` 
to always send health checks but to ignore any errors.

- Stephan Erb


On Feb. 18, 2015, 4:32 a.m., Moses Nakamura wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/31104/
> -----------------------------------------------------------
> 
> (Updated Feb. 18, 2015, 4:32 a.m.)
> 
> 
> Review request for Aurora and Brian Wickman.
> 
> 
> Bugs: AURORA-894
>     https://issues.apache.org/jira/browse/AURORA-894
> 
> 
> Repository: aurora
> 
> 
> Description
> -------
> 
> This is the first step in changing TASK_RUNNING to mean that the application 
> is alive and responding to health checks (if the task is configured to 
> support health checks).  This review is just to get feedback, I can't do this 
> review in parts because the scheduler must be changed in lockstep with the 
> executor, or everything will break.
> 
> I don't know if this is the right approach, could you give me some high level 
> advice?  I'm also not sure who to add to this review.
> 
> Here is the high level description that we came up with:
> 
> http://mail-archives.apache.org/mod_mbox/incubator-aurora-dev/201412.mbox/%3CCAOTkfX4KTUpMVcjeFf5%3DvvGXb91to5baNSzvyiwtk-sTddxGXQ%40mail.gmail.com%3E
> 
> 
> Diffs
> -----
> 
>   src/main/python/apache/aurora/executor/aurora_executor.py 
> 9c0282392dbb9cca308baf47adc1750c1f5cacc6 
>   src/main/python/apache/aurora/executor/common/announcer.py 
> dda76f018f472d7d8228459eb89f4c5daf9df26d 
>   src/main/python/apache/aurora/executor/common/health_checker.py 
> 60676ba0fbd8a218fe4309f07de28e2c66d54530 
>   src/main/python/apache/aurora/executor/common/resource_manager.py 
> 08e02e41b581f275f070228bb23c4cf2a0489f9a 
>   src/main/python/apache/aurora/executor/common/status_checker.py 
> 624921d68199df098ea51ee8a10815403bf58984 
>   src/test/python/apache/aurora/executor/common/test_announcer.py 
> 6b782778e52394de3744b43003226dac3f65169e 
>   src/test/python/apache/aurora/executor/common/test_health_checker.py 
> def249c2509a28f7145380f250f79202b653dc83 
>   
> src/test/python/apache/aurora/executor/common/test_resource_manager_integration.py
>  8f288f6115ab52265dfaffffda3f41d81271c55a 
> 
> Diff: https://reviews.apache.org/r/31104/diff/
> 
> 
> Testing
> -------
> 
> This hangs after I call is_health_checks_enabled, and I don't know why.  My 
> suspicion is that I'm throwing an exception and cratering the task executor, 
> but I don't know how to tell.  How do I get it to print?  I'm running it with:
> 
> ./pants test src/test/python/apache/aurora/executor::
> 
> 
> Thanks,
> 
> Moses Nakamura
> 
>

Reply via email to