> On Feb. 18, 2015, 7 p.m., Stephan Erb wrote:
> > For this change to be useful, we also have to think about the meaning of 
> > `initial_interval_secs`. In its curent form, health checks only start when 
> > the initial delay has passed. Commonly this delay has to be set very high 
> > in order to guarantee that a task will come up even in a worst case 
> > scenario (e.g., server where I pull my binary from is slow today). With 
> > your change however, no task would be considered running until this worst 
> > case time window has passed.
> > 
> > A potential solution would be to change the meaning of 
> > `initial_interval_secs` to always send health checks but to ignore any 
> > errors.

+1, that's a good idea.


- Moses


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/31104/#review72980
-----------------------------------------------------------


On Feb. 18, 2015, 4:32 a.m., Moses Nakamura wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/31104/
> -----------------------------------------------------------
> 
> (Updated Feb. 18, 2015, 4:32 a.m.)
> 
> 
> Review request for Aurora and Brian Wickman.
> 
> 
> Bugs: AURORA-894
>     https://issues.apache.org/jira/browse/AURORA-894
> 
> 
> Repository: aurora
> 
> 
> Description
> -------
> 
> This is the first step in changing TASK_RUNNING to mean that the application 
> is alive and responding to health checks (if the task is configured to 
> support health checks).  This review is just to get feedback, I can't do this 
> review in parts because the scheduler must be changed in lockstep with the 
> executor, or everything will break.
> 
> I don't know if this is the right approach, could you give me some high level 
> advice?  I'm also not sure who to add to this review.
> 
> Here is the high level description that we came up with:
> 
> http://mail-archives.apache.org/mod_mbox/incubator-aurora-dev/201412.mbox/%3CCAOTkfX4KTUpMVcjeFf5%3DvvGXb91to5baNSzvyiwtk-sTddxGXQ%40mail.gmail.com%3E
> 
> 
> Diffs
> -----
> 
>   src/main/python/apache/aurora/executor/aurora_executor.py 
> 9c0282392dbb9cca308baf47adc1750c1f5cacc6 
>   src/main/python/apache/aurora/executor/common/announcer.py 
> dda76f018f472d7d8228459eb89f4c5daf9df26d 
>   src/main/python/apache/aurora/executor/common/health_checker.py 
> 60676ba0fbd8a218fe4309f07de28e2c66d54530 
>   src/main/python/apache/aurora/executor/common/resource_manager.py 
> 08e02e41b581f275f070228bb23c4cf2a0489f9a 
>   src/main/python/apache/aurora/executor/common/status_checker.py 
> 624921d68199df098ea51ee8a10815403bf58984 
>   src/test/python/apache/aurora/executor/common/test_announcer.py 
> 6b782778e52394de3744b43003226dac3f65169e 
>   src/test/python/apache/aurora/executor/common/test_health_checker.py 
> def249c2509a28f7145380f250f79202b653dc83 
>   
> src/test/python/apache/aurora/executor/common/test_resource_manager_integration.py
>  8f288f6115ab52265dfaffffda3f41d81271c55a 
> 
> Diff: https://reviews.apache.org/r/31104/diff/
> 
> 
> Testing
> -------
> 
> This hangs after I call is_health_checks_enabled, and I don't know why.  My 
> suspicion is that I'm throwing an exception and cratering the task executor, 
> but I don't know how to tell.  How do I get it to print?  I'm running it with:
> 
> ./pants test src/test/python/apache/aurora/executor::
> 
> 
> Thanks,
> 
> Moses Nakamura
> 
>

Reply via email to