Richard Elling writes:
> James Carlson wrote:
> > However, that's not the tradition that was in use here, and that's
> > *not* what anyone calling fork/exec was ever expecting before.  The
> > change is surprising, and (apparently) hasn't been backed with a
> > rethink of how all these legacy bits of software actually use those
> > interfaces.
> >   
> 
> In the case of networks, which I think is crux of this thread, the

Actually, no, it's not.  Networking is only tangential here.  The crux
of the thread was how multiple processes on a single system that are
invoked by a parent doing fork/exec are able to cooperate to provide a
service.

SMF/contractfs supposes (by default) that a core dump of any of the
processes means the whole group is dead and should be restarted.  This
has the side-effect of meaning that any state associated with the
entire group is at the mercy of the weakest member of the group, even
if that member was invoked *unwittingly* by one of the other members.

Previously, in traditional UNIX (again, not HA, not SC), doing
fork/exec meant you were on the hook only to do wait() and perhaps use
SIGCLD -- and then only if you really cared.  The SMF behavior
surprisingly enforces the "caring" in a way that sometimes takes a
trifle of a problem (an inessential process [not the case here, but
can be] dumping core) and turns it into a mountain of a problem
(restarting some crucial service).

The fact that the problem was noticed in a networking daemon was
purely incidental.  The problem really has nothing whatsoever to do
with networking.

Now, it's entirely reasonable to argue (as Jordan did) that killing
all the processes in a failing service is a Good Thing.  It probably
is.  But it's also a fundamental change, and it's unclear that the
implications of this change have been worked correctly throughout the
system and (worse) through third-party code.

It's obvious that in the case of ifconfig launching dhcpagent and
in.mpathd (again, via fork/exec, not any networking mechanism), this
was *not* properly factored.

> HA folks deal with logical addresses which are part of the resource
> group structure.  This has a learning curve for sys admins. What can
> we do to make this easier for everyone?

Logical addresses (or addresses of any sort) seem to have nothing to
do with the discussion we've been having.  I'm confused by your
question.

-- 
James Carlson, Solaris Networking              <james.d.carlson at sun.com>
Sun Microsystems / 35 Network Drive        71.232W   Vox +1 781 442 2084
MS UBUR02-212 / Burlington MA 01803-2757   42.496N   Fax +1 781 442 1677

Reply via email to