James Carlson wrote: > Right, but the distinction I was drawing was between the "parent is on > the hook to figure out what to do about failures" design school (i.e., > traditional UNIX) and the new SMF+contracts school that (at least by > default) bucks that trend.
Yes. SMF removes some isolation; it is less forgiving of failure at any level. > That's not quite what I'm talking about. In the case I'm talking > about, the processes that were launched really were never any > responsibility of the original caller. He may not have known that the > processes existed, and didn't _expect_ to be taking any ownership of > them, and, in some cases (as with event hooks), he probably has no > design-level control over them at all -- some _end user_ is installing > binaries for him to run. Hmm. Maybe. I know you tried to steer us away from the "restart" question, but I think that remains relevant here. What is the goal of detecting the failure at the SMF level and acting on it? Is it to attempt to repair a broken service? If so, then restarting the parent is probably overkill. The further the failure is from the parent, the less likely it is that a restart will happen to fix the problem, and the more likely it is that other, working, parts of the system will be unnecessarily disrupted. Is it a mechanism to report and call attention to a failure, so that the administrator can do something about it? If so, my intolerant side is in favor of all kinds of alarm bells and klaxons. Whatever will get somebody's attention. (The administrator can always do something about it. Uninstall the flakey component. Get on the phone to the component's vendor. Whatever it takes.) In either event, I don't think the goal is to assign blame for the failure. > Still, even in a development environment, I don't want ifconfig(1M) > (or its invoker) to take a signal if dhcpagent or in.mpathd dies, or > if dhcpagent's event hook script (from the user) drops core. In those > cases, that'll happen today, and it's not right. I'm not so sure. If those things fail, I *want* the issue raised. In my more draconian moments, I want it raised so loudly that it cannot be ignored, so that it _must_ be addressed. (Plus, in a development environment, doesn't your "all in the same playground" comment apply?)