[smf-discuss] svc.startd notices dead child, kills the parent

James Carlson Wed, 30 Apr 2008 16:00:00 -0400

Liane Praza writes:
> James Carlson wrote:
> > Liane Praza writes:
> >> Yep.  And Dave already went over in the first message of this thread 
> >> how you can opt out of that behaviour as either a service developer or 
> >> an admin.
> > 
> > Right ... though both approaches have some warts to deal with.
> 
> I'm well aware of the warts of the "ignore all cores" solution. :)  What 
> are the warts of the "launch the non-critical processes in a separate 
> contract" solution?


You've got to hunt down all of the daemons that are doing this, modify
the source to handle the contract issues, and (of course) retest
everything.

Perhaps a small issue with some bits, but a more annoying issue if the
source is not of Sun origin.  We'll end up having to fork those things
or #ifdef __sun__ them.

And, obviously, we haven't found all of the problem areas.

> >   a) what the correct practice really is here; I'm slowly gathering
> >      that contract-awareness is *required* for processes that start up
> >      things for which they're disclaiming direct responsibility.
> 
> If awareness can reduce to "use ctrun to exec the process", then yes, I 
> agree.

Yep; that's one way.

> >   b) how much additional investigation is required; I suspect that
> >      there are far more of these sorts of cases buried in the source
> >      base than we realize, and that it'll take some time to ferret
> >      them out.
> 
> That might be true.  But, I suspect that there are also plenty of cases 
> where the source assumes its exec'ed process won't be killed due to an 
> uncorrectable memory error too.

I honestly don't know.  :-<

> >   c) whether there are features missing from SMF that should be added;
> >      on that score, I think that on-demand service behavior is a gap.
> 
> OK.  I guess I've seen the on-demand part as orthogonal, but that 
> doesn't really matter.  I'll be interested to see your RFE.

Done -- CR 6696281.

It is orthogonal to this one issue, except that *both* of them address
the same underlying problem.  In this particular case, we have a
service that needs to invoke another on-demand service.

There's no way to do that, so what the code really does is fork+exec.
In the pre-SMF world, that was a great solution.  Post-SMF, no so
much.  Those exec'd processes (no matter what they do) stay in the
same process contract and lurk.

A subsequent request or problem that causes the daemon to fault causes
the original invoker -- who may well have gone on to bigger and better
things -- to take the hit.  Any others who came along are unaffected,
as it's an on-demand service which runs as long as there's work.

> Orthogonally: I admit I'm a bit perplexed when I consider on-demand. 
> The 'requests' always come in an application-specific way.  If the 
> application requires a daemon once the request is made, why not leave 
> the daemon around waiting for requests, rather than having some program 
> 'launch' the daemon (by either execing it directly as we have in the 
> past, or telling SMF that it needs the daemon)?  It's not like the 
> daemon waiting for requests needs to use many resources.

That's the core issue of on-demand: users don't want a gazillion idle
processes cluttering the system.

In the past (and since _well_ before my tenure there), PSARC has
demanded that daemons not run unless there's some reason for them to
run.  Having them parked in the background was considered "bad," and
so much so that it's even part of the 20 questions.

Perhaps that rule could be revisited.  If so, then I'd happily run
hundreds of little daemons doing nothing, and forget all about
on-demand.  ;-}

(Consider that what inetd does is essentially on-demand: it's just not
generic.  It's based on network traffic demand, and not on anything
else.  Cron is similar.)

> It also simplifies the IPC of processes which need to request things of 
> those services.  If it's an on-demand service, we need something to 
> notice the request, and then launch the service, and then do the task. 
> If the service is just running waiting for requests, then we just need 
> something to tell the service to do the required task.

I agree.  It's an irritating thing to put up with in general, but it's
long been a design practice.

> I'm not intrinsically opposed to on-demand, but I guess I haven't 
> completely understood the benefit of it.  I'm happy to wait for your RFE 
> to help convince me. :)

OK.  :-/

> > Yep; that's understood.  I'm interested in what to do in order to play
> > nice.  And I'm pretty sure that (as I mentioned to Dave Powell about
> > sendmail) merely disabling restart on core dump is the wrong answer.
> > It might be an _expedient_ answer (not having to track down all of the
> > exposed external process invocations that fall under the "disclaiming"
> > issue), but I don't think it's actually right, which means there's
> > more (and Sun-specific) work to be done.
> 
> It's the best solution we could come up with (in the time available) for 
> services unwilling or unable to change their code in a Solaris-specific 
> way, which is what the Solaris sendmail maintainer told us when we last 
> asked.
> 
> I agree it's not the ideal/right answer.

That's exactly the tension I'm worried about.

-- 
James Carlson, Solaris Networking              <james.d.carlson at sun.com>
Sun Microsystems / 35 Network Drive        71.232W   Vox +1 781 442 2084
MS UBUR02-212 / Burlington MA 01803-2757   42.496N   Fax +1 781 442 1677

[smf-discuss] svc.startd notices dead child, kills the parent

Reply via email to