I'm +1 on removing the time control as well. If you need to extend the
snooze you could always touch -m the snooze file?

On Wed, Oct 8, 2014 at 9:51 AM, Bill Farner <wfar...@apache.org> wrote:

>
>
> > On Oct. 6, 2014, 10:40 p.m., Brian Wickman wrote:
> > > docs/configuration-reference.md, lines 359-360
> > > <
> https://reviews.apache.org/r/26383/diff/1/?file=714257#file714257line359>
> > >
> > >     Is there any reason this needs to be configurable?  Why not just
> hardcode the filename as '.healthchecksnooze' and then allow the user to
> specify the snooze at runtime, e.g. echo '600' > .healthchecksnooze to
> sleep for 600 seconds.  (And if the value is malformed, just unlink and
> don't snooze.)
> >
> > David Pan wrote:
> >     A few questions:
> >     1. In the case if the value is malformed, do we want to force fail
> the health check in order to alert the user the fact that they failed at
> setting the snooze duration.  Otherwise, the user might not notice the
> mistake.
> >     2. Should we read the value from the file only the first time the
> snooze file is created.  Or, should we allow the user change the value on
> the fly.
> >
> > Brian Wickman wrote:
> >     1. I think force failing the health check would be counterproductive
> -- if you accidentally fat finger, you don't want to kill the task which
> might already be a unicorn.  Instead just unlinking right away or perhaps
> using a default snooze value.
> >
> >     2. Maybe do mtime + time in the file as the snooze expiration, and
> re-read if the mtime changed.  This means at least doing a stat() each
> health check interval, but will allow you to change the snooze on the fly.
>
> How about removing the time control altogether, and let presence of the
> file serve as the snooze?  It's easier to add the time control later than
> to remove it if we decide it's unneeded.  This allows us to sidestep the
> malformed file content discussion.
>
>
> - Bill
>
>
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/26383/#review55592
> -----------------------------------------------------------
>
>
> On Oct. 6, 2014, 9:24 p.m., David Pan wrote:
> >
> > -----------------------------------------------------------
> > This is an automatically generated e-mail. To reply, visit:
> > https://reviews.apache.org/r/26383/
> > -----------------------------------------------------------
> >
> > (Updated Oct. 6, 2014, 9:24 p.m.)
> >
> >
> > Review request for Aurora, Joe Smith, Brian Wickman, and Zameer Manji.
> >
> >
> > Repository: aurora
> >
> >
> > Description
> > -------
> >
> > The health check disabler allows health checks for a job to be snoozed
> temporarily by touching a snooze file in the job's sandbox.  The path of
> the snooze file and the snooze duration can be set in the
> HealthCheckConfig.  The appropriate unit tests were modified/added.
> >
> > The corresponding JIRA ticket is the following:
> > https://issues.apache.org/jira/browse/AURORA-795
> >
> >
> > Diffs
> > -----
> >
> >   docs/configuration-reference.md
> 5166d45ddf95ae5d8afe39dd3b00654ac91857ec
> >   docs/configuration-tutorial.md
> 67998e9dab6ac429d96d7c0d2df959336b767f32
> >   src/main/python/apache/aurora/config/schema/base.py
> f12634f103c3eb20e43f37c25d9b0fc3e3d228ec
> >   src/main/python/apache/aurora/executor/common/health_checker.py
> 4980411c847d12655cbb363404707ebd9f0bd163
> >   src/test/python/apache/aurora/executor/common/BUILD
> c7f7a003c865d479ba6e3cd7b5349322f884f653
> >   src/test/python/apache/aurora/executor/common/test_health_checker.py
> aa36415fa891fc523a3a376ffeca5d3cd5ceabec
> >
> > Diff: https://reviews.apache.org/r/26383/diff/
> >
> >
> > Testing
> > -------
> >
> > On vagrant in ~/aurora, I ran
> > ./pants src/test/python/apache/aurora/executor::
> >
> >
> > Thanks,
> >
> > David Pan
> >
> >
>
>

Reply via email to