Richard Elling writes: > James Carlson wrote: > > However, that's not the tradition that was in use here, and that's > > *not* what anyone calling fork/exec was ever expecting before. The > > change is surprising, and (apparently) hasn't been backed with a > > rethink of how all these legacy bits of software actually use those > > interfaces. > > > > In the case of networks, which I think is crux of this thread, the
Actually, no, it's not. Networking is only tangential here. The crux of the thread was how multiple processes on a single system that are invoked by a parent doing fork/exec are able to cooperate to provide a service. SMF/contractfs supposes (by default) that a core dump of any of the processes means the whole group is dead and should be restarted. This has the side-effect of meaning that any state associated with the entire group is at the mercy of the weakest member of the group, even if that member was invoked *unwittingly* by one of the other members. Previously, in traditional UNIX (again, not HA, not SC), doing fork/exec meant you were on the hook only to do wait() and perhaps use SIGCLD -- and then only if you really cared. The SMF behavior surprisingly enforces the "caring" in a way that sometimes takes a trifle of a problem (an inessential process [not the case here, but can be] dumping core) and turns it into a mountain of a problem (restarting some crucial service). The fact that the problem was noticed in a networking daemon was purely incidental. The problem really has nothing whatsoever to do with networking. Now, it's entirely reasonable to argue (as Jordan did) that killing all the processes in a failing service is a Good Thing. It probably is. But it's also a fundamental change, and it's unclear that the implications of this change have been worked correctly throughout the system and (worse) through third-party code. It's obvious that in the case of ifconfig launching dhcpagent and in.mpathd (again, via fork/exec, not any networking mechanism), this was *not* properly factored. > HA folks deal with logical addresses which are part of the resource > group structure. This has a learning curve for sys admins. What can > we do to make this easier for everyone? Logical addresses (or addresses of any sort) seem to have nothing to do with the discussion we've been having. I'm confused by your question. -- James Carlson, Solaris Networking <james.d.carlson at sun.com> Sun Microsystems / 35 Network Drive 71.232W Vox +1 781 442 2084 MS UBUR02-212 / Burlington MA 01803-2757 42.496N Fax +1 781 442 1677