We use nagios to monitor our OSCAR 3.0 cluster, and some tasks were harder than others, but the easiest thing to get working was sending out emails when machines crash. I didnt have to write any plugins, I just used the out-of-the-box ssh checker code to make sure that all of the machines were accepting connections on ssh. (If not then they have "crashed" and i get sent an email) Since nagios doesn't really run any code on the cluster nodes, only on the head node (or a separate nagios node if you can spare the silicon), the setup is no different for an oscar cluster than any other kind of network. Also it doesnt really impact our performance since our head node is not a bottleneck, but your mileage may vary. Good luck.
- andy g > Folks, > Is there anyway to get clumon and/or ganglia to send email, etc, when a > node goes down? It shouldn't be too hard to add something to the code, > but I was wondering if it was already there, before diving in and doing > my own thing. > > Thanks > Frank > ------------------------------------------------------- This SF.Net email is sponsored by: IBM Linux Tutorials Free Linux tutorial presented by Daniel Robbins, President and CEO of GenToo technologies. Learn everything from fundamentals to system administration.http://ads.osdn.com/?ad_id=1470&alloc_id=3638&op=click _______________________________________________ Oscar-users mailing list [EMAIL PROTECTED] https://lists.sourceforge.net/lists/listinfo/oscar-users
