[smf-discuss] svc.startd notices dead child, kills the parent

Sebastien Roy Tue, 29 Apr 2008 19:22:21 -0400

Perhaps an SMF expert can shed some light into the following svc.startd
behavior:


A system is running NWAM, and therefore the network/physical:nwam
service is enabled, and the nwamd daemon is running.  The nwamd daemon
configures a network interface by exec'ing "ifconfig <intf> dhcp start",
which causes ifconfig to in turn exec dhcpagent.

At some point later, dhcpagent dies a horrible death by way of SIGSEGV
and dumps core due to a bug (obviously).

At this point, svc.startd somehow notices the dhcpagent crash and for
some reason decides that the system would be better off if the
network/physical:nwam service were restarted.  It prints the following
anonymous message in /var/svc/log/network-physical:nwam.log:

"Stopping because process dumped core."

(It would be nice if svc.startd were a bit more specific in that log
message, but that's not the core issue.)  It proceeds to stop and start
network/physical:nwam.  Why does it do this?  Is nwamd not to be trusted
to notice that it's unable to acquire a DHCP lease on this interface and
deal with this on its own?  nwamd is likely capable of noticing that
something went wrong with the network interface it was responsible for
and to either retry to acquire a lease, or try on another network
interface.  Even if it's not, it's not inconceivable that it could be.

-Seb

[smf-discuss] svc.startd notices dead child, kills the parent

Reply via email to