> On Oct. 9, 2014, 2:53 p.m., Bill Farner wrote:
> > src/main/python/apache/aurora/executor/common/health_checker.py, line 66
> > <https://reviews.apache.org/r/26383/diff/3/?file=716355#file716355line66>
> >
> >     FWIW i actually meant to suggest that the snooze has no concept of time 
> > at all.  If the file is there, don't perform health checks.  When you want 
> > to re-enable health checks, delete the file.  Happy to hear what others 
> > think about that.
> 
> Zameer Manji wrote:
>     Doesn't this mean user error can disable health checks forever? I think 
> we should treat disabling health checking as an exceptional state (since the 
> proceses has opted in to health checking) and therefore require user action 
> (increasing mtime) to continue to stay in this state.
> 
> Kevin Sweeney wrote:
>     Presumably we can trust the user here - health checking is after all 
> opt-in via {{thermos.ports[health]}}. Making it a simple on/off switch 
> greatly simplifies the code on our end.
> 
> Zameer Manji wrote:
>     I'm very worried about introducing a feature which can allow unhealthy 
> instances live forever. Furthermore this information isn't exposed on the 
> Aurora UI or Observer UI so there isn't an easy way to check if an instance 
> has health checking disabled or not. A user can be completely oblivious that 
> health checking of their instance has been disabled.

The motivation behind the health check disabler is for another project I am 
working on, which is the JVM Heap Dumper.  Long story short, the JVM Heap 
Dumper needs to disable health checks for the task that is being heap dumped.  
Under normal circumstances, the heap dumper can re-enable the health checks.  
However, if the heap dumper dies unexpectedly, the health checks will remain 
disabled forever if we don't implement time control.  We don't want the 
disabling and enabling of health checks to be a manual process.


- David


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/26383/#review55985
-----------------------------------------------------------


On Oct. 9, 2014, 1:56 a.m., David Pan wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/26383/
> -----------------------------------------------------------
> 
> (Updated Oct. 9, 2014, 1:56 a.m.)
> 
> 
> Review request for Aurora, Joe Smith, Brian Wickman, and Zameer Manji.
> 
> 
> Repository: aurora
> 
> 
> Description
> -------
> 
> The health check disabler allows health checks for a job to be snoozed 
> temporarily by touching a snooze file in the job's sandbox.  The appropriate 
> unit tests were modified/added.
> 
> The corresponding JIRA ticket is the following:
> https://issues.apache.org/jira/browse/AURORA-795
> 
> 
> Diffs
> -----
> 
>   src/main/python/apache/aurora/executor/common/health_checker.py 
> 4980411c847d12655cbb363404707ebd9f0bd163 
>   src/test/python/apache/aurora/executor/common/BUILD 
> c7f7a003c865d479ba6e3cd7b5349322f884f653 
>   src/test/python/apache/aurora/executor/common/test_health_checker.py 
> aa36415fa891fc523a3a376ffeca5d3cd5ceabec 
> 
> Diff: https://reviews.apache.org/r/26383/diff/
> 
> 
> Testing
> -------
> 
> On vagrant in ~/aurora, I ran
> ./pants src/test/python/apache/aurora/executor::
> 
> 
> Thanks,
> 
> David Pan
> 
>

Reply via email to