On Mon, Mar 23, 2009 at 03:22:06PM -0700, Bill Kendrick wrote: > > So I'm using lighttpd and fast_cgi, which occasionally has a problem where > it gets 'stuck'. (Unable to bring fast_cgi back to life, even though > resources are once again available.) Usually this results in Error 500s > that never go away until lighttpd is restarted. > > So to avoid having to manually go in and resurrect the server, I created > a shell script that tries to hit the site, checks for an HTTP 200 response, > and if it doesn't see that, it does a 'tail' of the access and error logs > (so that I can see what was happening at the time), and then invokes an > "/etc/init.d/lighttpd restart" to kick the server. > > I've got the following crontab entry: > > */2 * * * * root THE_SCRIPT > > meaning it should run once every 2 minutes, all the time. I only get an > email when I produces output, and it only does that if it fails to > contact the webserver. > > However, when it does fail, I get numerous reports at once. Could this > be because the server isn't responding immediately when I check the status? > > I'm doing that via, in the shell script: > > STATUS=`wget --save-headers http://www.MYSITE.com/ -O - 2> /dev/null | head > -1 | cut -d " " -f 2` > > In other words, hit the site, save the headers, save them out to stdout, > chop off the "HTTP/1.1" to get the delicious "200" (hopefully) status. > > > I guess maybe I need to give it a "--timeout" argument, and something > less than 120 seconds, so that the jobs don't run over each other...?
If the server is running, and accepts a connection, but not report back a 200, then I would imagine it will hang on. Is it accepting a socket connection, but not reporting back? What if you put a lock file in your script, so that it exits if another one is already running? 20.9.1 Locking a mailbox file http://rute.2038bug.com/node23.html.gz#SECTION002390000000000000000 Have you thought about using NAGIOS? It's tricky to configure, but there is a NAGIOS book that is available through the http://safari.oreilly.com. I believe it should have an area where you can configure it to take action if the service is down. Nagios, 2nd Edition by Wolfgang Barth Publisher: No Starch Press Pub Date: October 28, 2008 Print ISBN-13: 978-1-593-27179-4 Pages: 720 There is also the Linux Networking Cookbook. It has some fast easy methods for monitoring your httpd service. Linux Networking Cookbook by Carla Schroder Publisher: O'Reilly Media, Inc. Pub Date: November 26, 2007 Print ISBN-10: 0-596-10248-8 Print ISBN-13: 978-0-596-10248-7 Pages: 456 It has a NAGIOS section. It is also available through the safari site. I imagine you might also have some different sources as well. ;-) Or, you could write your own socket using select. Create you socket file descriptor and pass it to the following. http://www.gnu.org/software/hello/manual/libc/Waiting-for-I_002fO.html #include <errno.h> #include <stdio.h> #include <unistd.h> #include <sys/types.h> #include <sys/time.h> int input_timeout (int filedes, unsigned int seconds) { fd_set set; struct timeval timeout; /* Initialize the file descriptor set. */ FD_ZERO (&set); FD_SET (filedes, &set); /* Initialize the timeout data structure. */ timeout.tv_sec = seconds; timeout.tv_usec = 0; /* select returns 0 if timeout, 1 if input available, -1 if error. */ return TEMP_FAILURE_RETRY (select (FD_SETSIZE, &set, NULL, NULL, &timeout)); } int main (void) { fprintf (stderr, "select returned %d.\n", input_timeout (STDIN_FILENO, 5)); return 0; } brian -- Brian Lavender http://www.brie.com/brian/ _______________________________________________ vox-tech mailing list [email protected] http://lists.lugod.org/mailman/listinfo/vox-tech
