I'm not the maintainer of any C code, anywhere. While I do host a
mirror or two on bitbucket, I only do humble scripts, sorry. Gerrit is
around, he's just a bit elusive.
On 6/16/2015 9:37 AM, Buck Evan wrote:
I'd still like to get this merged.
Avery: are you the current maintainer?
I haven't seen Gerrit Pape on the list.
On Tue, Feb 17, 2015 at 4:49 PM, Buck Evan <b...@yelp.com
<mailto:b...@yelp.com>> wrote:
On Tue, Feb 17, 2015 at 4:20 PM, Avery Payne
<avery.p.pa...@gmail.com <mailto:avery.p.pa...@gmail.com>> wrote:
>
> On 2/17/2015 11:02 AM, Buck Evan wrote:
>>
>> I think there's only three cases here:
>>
>> 1. Users that would have gotten immediate failure, and no
amount of
>> spinning would help. These users will see their error delayed
by $SVWAIT
>> seconds, but no other difference.
>> 2. Users that would have gotten immediate failure, but could
have gotten
>> a success within $SVWAIT seconds. All of these users will of
course be glad
>> of the change.
>> 3. Users that would not have gotten immediate failure. None of
these
>> users will see the slightest change in behavior.
>>
>> Do you have a particular scenario in mind when you mention
"breaking lots
>> of existing installations elsewhere due to a default behavior
change"? I
>> don't see that there is any case this change would break.
<snip>
Thanks for the thoughtful reply Avery. My background is also
"maintaining business software", although putting it in those terms
gives me horrific visions of java servlets and soap protocols.
> I have to look at it from a viewpoint of "what is everything
else in the system expecting when this code is called". This
means thinking in terms of code-as-API, so that calls elsewhere
don't break.
As a matter of API, sv-check does sometimes take up to $SVWAIT
seconds to fail.
Any caller to sv-check will be expecting this (strictly limited)
delay, in the exceptional case.
My patch just extends this existing, documented behavior to the
special case of "unable to open supervise/ok".
The API is unchanged, just the amount of time to return the result
is changed.
> This happens because the use of "sv check (child)" follows the
convention of "check, and either succeed fast or fail fast", ...
Either you're confused about what sv-check does, or I'm confused about
what you're saying.
sv-check generaly doesn't fail fast (except in the special case I'm
trying to make no longer fail fast -- svrun is not started).
Generally it will spin for $SVWAIT seconds before failing.
> Without that fast-fail, the logged hint never occurs; the
sysadmin now has to figure out which of three possible services in
a dependency chain are causing the hang.
Even if I put the above issue aside aside, you wouldn't get a hang,
you'd get the failure message you're familiar with, just several
seconds (default: 7) later. The sysadmin wouldn't search any more than
previously. He would however find that the system fails less often,
since it has that 7 seconds of tolerance now. This is how sv-check
behaves already when a ./check script exits nonzero.
> While this is
> implemented differently from other installations, there are
known cases
> similar to what I am doing, where people have ./run scripts like
this:
>
> #!/bin/sh
> sv check child-service || exit 1
> exec parent-service
This would still work just fine, just strictly more often.