On Wed, 2006-05-17 at 16:48 -0500, Nicolas Williams wrote: > On Wed, May 17, 2006 at 02:10:03PM -0700, lianep at eng.sun.com wrote: > > In the meantime, you could put together a new service which runs this > > sort of checking, and runs an appropriate svcadm command when the tests > > fail. It isn't particularly elegant, but would be pretty simple to do. > > The big thing is you'd want to take into account the target service's > > "state" and "next_state", and not send a bunch of restart commands if > > the service is offline, in maintenance, or in the middle of a > > transition. > > Or perhaps you could fire off a monitor from the start method of the > actual service to be monitored using ctrun to run the monitor in its own > process contract and restartably. This avoids having a separate SMF > service polluting the SMF service namespace.
This can get a bit complicated. Suppose FMA kills the monitor contract and the monitor loses its state of the monitored service. For simple monitors, such as "does the process exist," this won't be a problem. For a monitor which is making a database transaction, then there needs to be enough smarts in the monitor to cancel an in-flight transactions which might interfere with its analysis of the database health. It is not clear to me that stateless monitors will be more useful than the current method, so it might be somewhat complex to write a good monitor. Monitors also tend to have timeouts, which further complicates their deployment. It is not clear to me that we can avoid following the current path of cluster monitors, even as they get more complicated (eg. dynamically adjustable timeouts). It might be better just to implement a single-node cluster instead, when possible, thus leveraging the existing agents. -- richard