On Fri, Apr 18, 2008 at 08:11:23PM +0100, Neil Garthwaite wrote: > > Though, as an aside, I am interested in how this works in real > > life. If the transient service fails, and enters maintenance, what > > will the administrator do differently than your stop method script > > to clean up so that they don't have to reboot to repair the service? > > Well, I hope this doesn't appear too messy. > > I'm using SMF with Sun Cluster. In particular, in SC we have an agent > that can failover non-global zones (S10 native as well as S8 and lx > branded zones) between SC nodes. In addition to failing over the non- > global zone, if that non-global zone is a S10 zone then we can also > enable/disable an SMF service within the "failover" zone. > > In this regard, SC manages the SMF enable/disable and additionally > probes the application that was started by the SMF service. This then > allows the probe to determine wedged applications or a bad application > and signal back to SC to either perform a local restart or initiate a > failover to another SC node.
Sounds like you need to have either your own restarter, or notification APIs by which to monitor service state and react accordingly. (We've had a debate on this list in recent months about notification APIs.) Nico --