[smf-discuss] svc.startd notices dead child, kills the parent

Liane Praza Wed, 30 Apr 2008 12:38:07 -0700

James Carlson wrote:
> Liane Praza writes:
>> Yep.  And Dave already went over in the first message of this thread 
>> how you can opt out of that behaviour as either a service developer or 
>> an admin.
> 
> Right ... though both approaches have some warts to deal with.


I'm well aware of the warts of the "ignore all cores" solution. :)  What 
are the warts of the "launch the non-critical processes in a separate 
contract" solution?

> 
>> This default is how we can make the entire system more fault tolerant in 
>> the face of things like uncorrectable memory errors.  Remember, in S9 we 
> 
> Yes, I understand the rationale behind it; it's the unintended
> consequences that are of interest to me (at least for this thread).
> 
>> But, I'm not really sure what you're trying to accomplish with this 
>> discussion, Jim.  Are you trying to propose that we un-do this default 
>> which was set in S10 and break programs and service definitions which 
>> were made based on these defaults?
> 
> No, no -- not at all.  I'm not sure about others on the thread, but
> I'm interested in finding out:

Thanks for the clarifications, Jim.  I think I was misreading your 
concerns a bit.

> 
>   a) what the correct practice really is here; I'm slowly gathering
>      that contract-awareness is *required* for processes that start up
>      things for which they're disclaiming direct responsibility.

If awareness can reduce to "use ctrun to exec the process", then yes, I 
agree.

> 
>   b) how much additional investigation is required; I suspect that
>      there are far more of these sorts of cases buried in the source
>      base than we realize, and that it'll take some time to ferret
>      them out.

That might be true.  But, I suspect that there are also plenty of cases 
where the source assumes its exec'ed process won't be killed due to an 
uncorrectable memory error too.

> 
>   c) whether there are features missing from SMF that should be added;
>      on that score, I think that on-demand service behavior is a gap.

OK.  I guess I've seen the on-demand part as orthogonal, but that 
doesn't really matter.  I'll be interested to see your RFE.

Orthogonally: I admit I'm a bit perplexed when I consider on-demand. 
The 'requests' always come in an application-specific way.  If the 
application requires a daemon once the request is made, why not leave 
the daemon around waiting for requests, rather than having some program 
'launch' the daemon (by either execing it directly as we have in the 
past, or telling SMF that it needs the daemon)?  It's not like the 
daemon waiting for requests needs to use many resources.

It also simplifies the IPC of processes which need to request things of 
those services.  If it's an on-demand service, we need something to 
notice the request, and then launch the service, and then do the task. 
If the service is just running waiting for requests, then we just need 
something to tell the service to do the required task.

I'm not intrinsically opposed to on-demand, but I guess I haven't 
completely understood the benefit of it.  I'm happy to wait for your RFE 
to help convince me. :)

> 
>> I do understand your concerns, but believe that this was one of those 
>> tradeoff calls.  Either we make things behave in a manner least likely 
> [...]
> 
> Yep; that's understood.  I'm interested in what to do in order to play
> nice.  And I'm pretty sure that (as I mentioned to Dave Powell about
> sendmail) merely disabling restart on core dump is the wrong answer.
> It might be an _expedient_ answer (not having to track down all of the
> exposed external process invocations that fall under the "disclaiming"
> issue), but I don't think it's actually right, which means there's
> more (and Sun-specific) work to be done.

It's the best solution we could come up with (in the time available) for 
services unwilling or unable to change their code in a Solaris-specific 
way, which is what the Solaris sendmail maintainer told us when we last 
asked.

I agree it's not the ideal/right answer.

liane

[smf-discuss] svc.startd notices dead child, kills the parent

Reply via email to