> On Feb. 18, 2015, 7 p.m., Stephan Erb wrote: > > For this change to be useful, we also have to think about the meaning of > > `initial_interval_secs`. In its curent form, health checks only start when > > the initial delay has passed. Commonly this delay has to be set very high > > in order to guarantee that a task will come up even in a worst case > > scenario (e.g., server where I pull my binary from is slow today). With > > your change however, no task would be considered running until this worst > > case time window has passed. > > > > A potential solution would be to change the meaning of > > `initial_interval_secs` to always send health checks but to ignore any > > errors.
+1, that's a good idea. - Moses ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/31104/#review72980 ----------------------------------------------------------- On Feb. 18, 2015, 4:32 a.m., Moses Nakamura wrote: > > ----------------------------------------------------------- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/31104/ > ----------------------------------------------------------- > > (Updated Feb. 18, 2015, 4:32 a.m.) > > > Review request for Aurora and Brian Wickman. > > > Bugs: AURORA-894 > https://issues.apache.org/jira/browse/AURORA-894 > > > Repository: aurora > > > Description > ------- > > This is the first step in changing TASK_RUNNING to mean that the application > is alive and responding to health checks (if the task is configured to > support health checks). This review is just to get feedback, I can't do this > review in parts because the scheduler must be changed in lockstep with the > executor, or everything will break. > > I don't know if this is the right approach, could you give me some high level > advice? I'm also not sure who to add to this review. > > Here is the high level description that we came up with: > > http://mail-archives.apache.org/mod_mbox/incubator-aurora-dev/201412.mbox/%3CCAOTkfX4KTUpMVcjeFf5%3DvvGXb91to5baNSzvyiwtk-sTddxGXQ%40mail.gmail.com%3E > > > Diffs > ----- > > src/main/python/apache/aurora/executor/aurora_executor.py > 9c0282392dbb9cca308baf47adc1750c1f5cacc6 > src/main/python/apache/aurora/executor/common/announcer.py > dda76f018f472d7d8228459eb89f4c5daf9df26d > src/main/python/apache/aurora/executor/common/health_checker.py > 60676ba0fbd8a218fe4309f07de28e2c66d54530 > src/main/python/apache/aurora/executor/common/resource_manager.py > 08e02e41b581f275f070228bb23c4cf2a0489f9a > src/main/python/apache/aurora/executor/common/status_checker.py > 624921d68199df098ea51ee8a10815403bf58984 > src/test/python/apache/aurora/executor/common/test_announcer.py > 6b782778e52394de3744b43003226dac3f65169e > src/test/python/apache/aurora/executor/common/test_health_checker.py > def249c2509a28f7145380f250f79202b653dc83 > > src/test/python/apache/aurora/executor/common/test_resource_manager_integration.py > 8f288f6115ab52265dfaffffda3f41d81271c55a > > Diff: https://reviews.apache.org/r/31104/diff/ > > > Testing > ------- > > This hangs after I call is_health_checks_enabled, and I don't know why. My > suspicion is that I'm throwing an exception and cratering the task executor, > but I don't know how to tell. How do I get it to print? I'm running it with: > > ./pants test src/test/python/apache/aurora/executor:: > > > Thanks, > > Moses Nakamura > >