Jim Klimov wrote:
> I can't find an answer whether it is possible for an SMF service to run 
> automated
> health checks (as defined by the service script's author) and restart if 
> required.
> 
> For a specific example, we run Magnolia CMS in a Tomcat server in a zone. 
> It depends on a MySQL server to work properly. This server lives in another 
> zone 
> (usually on another server, in fact). One of the problem-scenarios is as 
> follows: 
> * if the MySQL server is not running, the Magnolia web-app is not 
> initialized; 
> * if the MySQL server was restarted, the web-app's connection breaks. 
> In either case, Tomcat is running (SMF is happy - its contract service is 
> fulfilled), 
> but the end-user service is no longer provided. To fix the problem the 
> web-app or 
> the whole web-container need to be restarted.
> 
> Other scenarios with different web-applications involve running out of memory 
> or
> slowing down then crawling to death. In any case, as soon as the end-user 
> service
> goes below a SLA the service should be recycled - it is known to help. (say, 
> the 
> web-site's page takes over 5s to render, or some runaway loop consumes 95+% 
> CPU for many minutes). For most third-party applications, we can't fix them 
> directly (i.e. rewrite to prevent them from failing), but we have to do our 
> best to 
> reduce downtime for the customers.
> 
> We do currently have some scripts to run such checks and maintain our 
> services,
> so their logic (and to some extent implementation) is not the problem. These
> scripts are placed into root's (or webserver user's) crontab and invoke init 
> scripts
> to recycle services.
> 
> I want to convert these scripts and crontabs into a single SMF service which 
> includes complicated self-monitoring, to reduce the complexity of (default) 
> configuration as well as improve observability. I want to stress again that a
> working contract in the OS is not the only metric which describes a service
> as truly "online".
> 
> Are there some established best-practices and examples, or am I doomed to 
> invent another bicycle? ;)
> 

Jim,

I don't think SMF directly supports application-specific monitoring. 
However, you may want to look into Solaris Cluster / Open HA Cluster. 
The HA Cluster software supports application-specific monitoring as well 
as something called "restart dependencies", which would allow you to 
specify that Tomcat should be restarted automatically whenever MySQL is 
restarted.

For information on Open HA Cluster, which runs on OpenSolaris, see 
http://www.opensolaris.org/os/community/ha-clusters/ohac/

For information on Solaris Cluster, which runs on Solaris 9 and 10, see 
http://www.sun.com/software/solaris/cluster/index.xml

Direct your HA cluster questions to ha-clusters-discuss at opensolaris.org

Thanks,
Nick

Reply via email to