Hi, You can try several approaches (I'll list 2 that I'm aware of):
1) Automatic restarts on OutOfMemory errors: Add the following to CATALINA_OPTS: -XX:OnOutOfMemoryError=/usr/sbin/restart_tcserver Write your restart_tcserver (you may send an e-mail notification from it etc.) 2) This is what I do (please critisice/suggest improvements to this approach): I've got 2 servers with Tomcat+Apache httpd with heartbeat beetween them: I'm running this little script every 15 min. via cron: -------------------- # cat /srv/scripts/test_live.sh #!/bin/bash SERVICE_HTTPD=$(ps -ef | grep -v grep | grep -c httpd) SERVICE_TOMCAT=$(ps -ef | grep -v grep | grep -c tomcat) SERVICE_HEARTBEAT=$(ps -ef | grep -v grep | grep -c heartbeat) SERVICE_STATUS=$(/srv/scripts/check_http.pl -H confluence-server.myorg.com -u /blank.html) # While testing, please uncomment the following echo statements if [ $SERVICE_HTTPD -ne 0 -a $SERVICE_TOMCAT -ne 0 -a "$SERVICE_STATUS" = "Status: OK" ] then # echo "SERVICE_HTTPD and SERVICE_TOMCAT and SERVICE_STATUS are OK, everything is fine" exit elif [ $SERVICE_HEARTBEAT -ne 0 ] then echo "The following output triggered failover: SERVICE_HTTPD=$SERVICE_HTTPD , SERVICE_TOMCAT=$SERVICE_TOMCAT , SERVICE_STATUS=$SERVICE_STATUS , failing over to spare server" echo "The following output triggered failover: SERVICE_HTTPD=$SERVICE_HTTPD , SERVICE_TOMCAT=$SERVICE_TOMCAT , SERVICE_STATUS=$SERVICE_STATUS , failing over to a spare server at `date`" | /bin/mailx -s "Server `uname -n` encountered a problem, failing over to a spare server at `date`" lkolchin at gmail dot com /etc/init.d/heartbeat stop else # echo "This server probably failed over to the spare one, nothing to do" exit fi --------------------- If Tomcat+Apache running and application responsive ($SERVICE_STATUS) do nothing if at least one of those conditions is not true, failover to a spare server. check_http.pl - This is a perl script (from Nagios Plugin I believe)- ## check_http.pl ## Copyright (c) 2008, Oliver Wittenburg <oli...@wiburg.de> ## ## This program is free software: you can redistribute it and/or modify it under ## the terms of the GNU General Public License as published by the Free Software ....... Cheers, Leon Kolchinsky On Thu, Sep 23, 2010 at 04:30, Christopher Schultz < ch...@christopherschultz.net> wrote: > -----BEGIN PGP SIGNED MESSAGE----- > Hash: SHA1 > > Shashank, > > On 9/22/2010 8:30 AM, Mendiratta, Shashank wrote: > > Thanx , about that here the outbound port 80 is blocked so we cannot > > wget , moreover this wont solve the problem as to why the the services > > are getting hung. > > Hmm. Can you monitor from the server itself? That's not unusual to do. > Also, connections to localhost:80 usually work even when software-based > firewalls are in place, since the local host is usually considered trusted. > > > Well I had an idea, please critic it. Why not monitor the server.log > > file if we get some kind of error. We send an alert and then restart the > > service . Befire that we have to make a repository of types of error > > that can occur > > We have one particularly poorly-written webapp that has a habit of > running out of memory. We have segregated it into it's own Tomcat > instance and actually do scan the log file for errors in the way you > describe. > > The script is essentially this: > > grep -m 1 OutOfMemoryError ${LOGFILE} > /dev/null > > if [ "$?" == "0" ] ; then > > # notify an administrator > > fi > > It's not particularly elegant, but it gets the job done. > > - -chris > -----BEGIN PGP SIGNATURE----- > Version: GnuPG v1.4.10 (MingW32) > Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ > > iEYEARECAAYFAkyaS0wACgkQ9CaO5/Lv0PCxXQCgwIlct+hqxxejBAEUAPw8+gXj > EiAAoImkWA55dP3Nw8iuWIqM2P/N7Hvk > =avt1 > -----END PGP SIGNATURE----- > > --------------------------------------------------------------------- > To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org > For additional commands, e-mail: users-h...@tomcat.apache.org > >