I'm not the maintainer of any C code, anywhere. While I do host a mirror or two on bitbucket, I only do humble scripts, sorry. Gerrit is around, he's just a bit elusive.

On 6/16/2015 9:37 AM, Buck Evan wrote:
I'd still like to get this merged.

Avery: are you the current maintainer?
I haven't seen Gerrit Pape on the list.

On Tue, Feb 17, 2015 at 4:49 PM, Buck Evan <b...@yelp.com <mailto:b...@yelp.com>> wrote:

    On Tue, Feb 17, 2015 at 4:20 PM, Avery Payne
    <avery.p.pa...@gmail.com <mailto:avery.p.pa...@gmail.com>> wrote:
    >
    > On 2/17/2015 11:02 AM, Buck Evan wrote:
    >>
    >> I think there's only three cases here:
    >>
    >>  1. Users that would have gotten immediate failure, and no
    amount of
    >> spinning would help. These users will see their error delayed
    by $SVWAIT
    >> seconds, but no other difference.
    >>  2. Users that would have gotten immediate failure, but could
    have gotten
    >> a success within $SVWAIT seconds. All of these users will of
    course be glad
    >> of the change.
    >>  3. Users that would not have gotten immediate failure. None of
    these
    >> users will see the slightest change in behavior.
    >>
    >> Do you have a particular scenario in mind when you mention
    "breaking lots
    >> of existing installations elsewhere due to a default behavior
    change"? I
    >> don't see that there is any case this change would break.
    <snip>

    Thanks for the thoughtful reply Avery. My background is also
    "maintaining business software", although putting it in those terms
    gives me horrific visions of java servlets and soap protocols.

    > I have to look at it from a viewpoint of "what is everything
    else in the system expecting when this code is called".  This
    means thinking in terms of code-as-API, so that calls elsewhere
    don't break.

    As a matter of API, sv-check does sometimes take up to $SVWAIT
    seconds to fail.
    Any caller to sv-check will be expecting this (strictly limited)
    delay, in the exceptional case.
    My patch just extends this existing, documented behavior to the
    special case of "unable to open supervise/ok".
    The API is unchanged, just the amount of time to return the result
    is changed.

    > This happens because the use of "sv check (child)" follows the
    convention of "check, and either succeed fast or fail fast", ...

    Either you're confused about what sv-check does, or I'm confused about
    what you're saying.
    sv-check generaly doesn't fail fast (except in the special case I'm
    trying to make no longer fail fast -- svrun is not started).
    Generally it will spin for $SVWAIT seconds before failing.

    > Without that fast-fail, the logged hint never occurs; the
    sysadmin now has to figure out which of three possible services in
    a dependency chain are causing the hang.

    Even if I put the above issue aside aside, you wouldn't get a hang,
    you'd get the failure message you're familiar with, just several
    seconds (default: 7) later. The sysadmin wouldn't search any more than
    previously. He would however find that the system fails less often,
    since it has that 7 seconds of tolerance now. This is how sv-check
    behaves already when a ./check script exits nonzero.


    > While this is
    > implemented differently from other installations, there are
    known cases
    > similar to what I am doing, where people have ./run scripts like
    this:
    >
    > #!/bin/sh
    > sv check child-service || exit 1
    > exec parent-service

    This would still work just fine, just strictly more often.



Reply via email to