Roland Mainz wrote: >> svc.startd(1M) says that a contract-model service has failed if: >> o all processes in the service exit >> o any processes in the service produce a core dump > > How is this detected ?
I don't know the details, but the processes are all members of a new process-grouping construct called a "contract". There are mechanisms for a monitoring process (svc.startd in this case) to detect events that happen to processes inside the contract, even when they are not its immediate children. >> o a process outside the service sends a service process a >> fatal signal (for example, an administrator terminates >> a service process with the pkill command) > > What happens if a process within the service sends another process > within the service a SIGKILL and doesn't reap the child's corpse > immediately ? And what happens if the same happens between threads of > the same process (which will be important since we're going to add > thread support to ksh93 and these threads should be able to communicate > via signals, too) ? I assume that since those signals originate inside the contract that they do not trigger this rule. >>> Note that any exit >>> code from 0 ... 255 is _valid_ for shell scripts and applications and >>> killing whole services just because a child process returned a non-zero >>> exit code may not be a good idea (I hope it's not implemented this way). >> I don't believe it is. > > Any idea who may know this exactly ? One of the true SMFers could speak up here, but I'm almost completely certain that a non-zero exit code inside the contract doesn't trigger a contract failure. That would cause *many* shell scripts to fail. > Is there a way to turn this behaviour "off" ? svc.startd(1M): startd/ignore_error The ignore_error property, if set, specifies a comma- separated list of ignored events. Legitimate string values in that list are core and signal. The default is to restart on all errors.