I'm +1 on removing the time control as well. If you need to extend the snooze you could always touch -m the snooze file?
On Wed, Oct 8, 2014 at 9:51 AM, Bill Farner <wfar...@apache.org> wrote: > > > > On Oct. 6, 2014, 10:40 p.m., Brian Wickman wrote: > > > docs/configuration-reference.md, lines 359-360 > > > < > https://reviews.apache.org/r/26383/diff/1/?file=714257#file714257line359> > > > > > > Is there any reason this needs to be configurable? Why not just > hardcode the filename as '.healthchecksnooze' and then allow the user to > specify the snooze at runtime, e.g. echo '600' > .healthchecksnooze to > sleep for 600 seconds. (And if the value is malformed, just unlink and > don't snooze.) > > > > David Pan wrote: > > A few questions: > > 1. In the case if the value is malformed, do we want to force fail > the health check in order to alert the user the fact that they failed at > setting the snooze duration. Otherwise, the user might not notice the > mistake. > > 2. Should we read the value from the file only the first time the > snooze file is created. Or, should we allow the user change the value on > the fly. > > > > Brian Wickman wrote: > > 1. I think force failing the health check would be counterproductive > -- if you accidentally fat finger, you don't want to kill the task which > might already be a unicorn. Instead just unlinking right away or perhaps > using a default snooze value. > > > > 2. Maybe do mtime + time in the file as the snooze expiration, and > re-read if the mtime changed. This means at least doing a stat() each > health check interval, but will allow you to change the snooze on the fly. > > How about removing the time control altogether, and let presence of the > file serve as the snooze? It's easier to add the time control later than > to remove it if we decide it's unneeded. This allows us to sidestep the > malformed file content discussion. > > > - Bill > > > ----------------------------------------------------------- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/26383/#review55592 > ----------------------------------------------------------- > > > On Oct. 6, 2014, 9:24 p.m., David Pan wrote: > > > > ----------------------------------------------------------- > > This is an automatically generated e-mail. To reply, visit: > > https://reviews.apache.org/r/26383/ > > ----------------------------------------------------------- > > > > (Updated Oct. 6, 2014, 9:24 p.m.) > > > > > > Review request for Aurora, Joe Smith, Brian Wickman, and Zameer Manji. > > > > > > Repository: aurora > > > > > > Description > > ------- > > > > The health check disabler allows health checks for a job to be snoozed > temporarily by touching a snooze file in the job's sandbox. The path of > the snooze file and the snooze duration can be set in the > HealthCheckConfig. The appropriate unit tests were modified/added. > > > > The corresponding JIRA ticket is the following: > > https://issues.apache.org/jira/browse/AURORA-795 > > > > > > Diffs > > ----- > > > > docs/configuration-reference.md > 5166d45ddf95ae5d8afe39dd3b00654ac91857ec > > docs/configuration-tutorial.md > 67998e9dab6ac429d96d7c0d2df959336b767f32 > > src/main/python/apache/aurora/config/schema/base.py > f12634f103c3eb20e43f37c25d9b0fc3e3d228ec > > src/main/python/apache/aurora/executor/common/health_checker.py > 4980411c847d12655cbb363404707ebd9f0bd163 > > src/test/python/apache/aurora/executor/common/BUILD > c7f7a003c865d479ba6e3cd7b5349322f884f653 > > src/test/python/apache/aurora/executor/common/test_health_checker.py > aa36415fa891fc523a3a376ffeca5d3cd5ceabec > > > > Diff: https://reviews.apache.org/r/26383/diff/ > > > > > > Testing > > ------- > > > > On vagrant in ~/aurora, I ran > > ./pants src/test/python/apache/aurora/executor:: > > > > > > Thanks, > > > > David Pan > > > > > >