[smf-discuss] svc.startd notices dead child, kills the parent

Jordan Brown (Sun) Wed, 30 Apr 2008 10:58:56 -0700

James Carlson wrote:
> Right, but the distinction I was drawing was between the "parent is on
> the hook to figure out what to do about failures" design school (i.e.,
> traditional UNIX) and the new SMF+contracts school that (at least by
> default) bucks that trend.


Yes.  SMF removes some isolation; it is less forgiving of failure at any 
level.

> That's not quite what I'm talking about.  In the case I'm talking
> about, the processes that were launched really were never any
> responsibility of the original caller.  He may not have known that the
> processes existed, and didn't _expect_ to be taking any ownership of
> them, and, in some cases (as with event hooks), he probably has no
> design-level control over them at all -- some _end user_ is installing
> binaries for him to run.

Hmm.  Maybe.

I know you tried to steer us away from the "restart" question, but I 
think that remains relevant here.

What is the goal of detecting the failure at the SMF level and acting on it?

Is it to attempt to repair a broken service?  If so, then restarting the 
parent is probably overkill.  The further the failure is from the 
parent, the less likely it is that a restart will happen to fix the 
problem, and the more likely it is that other, working, parts of the 
system will be unnecessarily disrupted.

Is it a mechanism to report and call attention to a failure, so that the 
administrator can do something about it?  If so, my intolerant side is 
in favor of all kinds of alarm bells and klaxons.  Whatever will get 
somebody's attention.

        (The administrator can always do something about it.  Uninstall
        the flakey component.  Get on the phone to the component's
        vendor.  Whatever it takes.)

In either event, I don't think the goal is to assign blame for the failure.

> Still, even in a development environment, I don't want ifconfig(1M)
> (or its invoker) to take a signal if dhcpagent or in.mpathd dies, or
> if dhcpagent's event hook script (from the user) drops core.  In those
> cases, that'll happen today, and it's not right.

I'm not so sure.  If those things fail, I *want* the issue raised.  In 
my more draconian moments, I want it raised so loudly that it cannot be 
ignored, so that it _must_ be addressed.  (Plus, in a development 
environment, doesn't your "all in the same playground" comment apply?)

[smf-discuss] svc.startd notices dead child, kills the parent

Reply via email to