We use nagios to monitor our OSCAR 3.0 cluster, and some tasks were harder
than others, but the easiest thing to get working was sending out emails
when machines crash.  I didnt have to write any plugins, I just used the
out-of-the-box ssh checker code to make sure that all of the machines were
accepting connections on ssh. (If not then they have "crashed" and i get
sent an email)  Since nagios doesn't really run any code on the cluster
nodes, only on the head node (or a separate nagios node if you can spare
the silicon), the setup is no different for an oscar cluster than any
other kind of network.  Also it doesnt really impact our performance since
our head node is not a bottleneck, but your mileage may vary.  Good luck.

- andy g


> Folks,
>       Is there anyway to get clumon and/or ganglia to send email, etc, when a
> node goes down?  It shouldn't be too hard to add something to the code,
> but I was wondering if it was already there, before diving in and doing
> my own thing.
>
> Thanks
> Frank
>




-------------------------------------------------------
This SF.Net email is sponsored by: IBM Linux Tutorials
Free Linux tutorial presented by Daniel Robbins, President and CEO of
GenToo technologies. Learn everything from fundamentals to system
administration.http://ads.osdn.com/?ad_id=1470&alloc_id=3638&op=click
_______________________________________________
Oscar-users mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/oscar-users

Reply via email to