Hi Ben, 2009/6/22 [email protected] <[email protected]>: > > Features: > + Email notifications on critical events (that I can specify) > + Overview of all systems being monitored showing current status > > > Monitoring: > > Critical: > * status of software RAID6 array (eg. if any drive fails, even if a hot > spare is available) > * usage % of various partitions > * monitor the status of my VMs (I intend to use virtualbox) > * monitor the status of backups (haven't yet determined what system I'll be > using) > > Desirable: > * monitor my UPS > + trigger shutdowns in VMs and then main system if power goes out. > > Future: > * monitor web logs on servers for hits, usage, etc. > * monitor security related logs on servers. > > Will it be simpler to use multiple tools, or is there some giant swiss army > knife that it's worth learning?
What you're trying to achieve broadly falls into two categories: * data collection * notification I find that most of the monitoring tools out there try to do both, and don't quite manage to pull it off. For the data collection, I would recommend using something like collectd[0]. It can collect stats on disk space, io throughput, ups usage, web server usage (apache2 + nginx), vm utilisation, and a whole bunch of other things. It's also network aware, so you can collect stats on all your machines individually, and aggregate the results in one place. For the notification, the easiest option would be Nagios[1]. collectd provides a collectd-nagios[2] binary which can be used to query stats that collectd has collected, and return warnings depending on whether values are out of range (which Nagios will pick up and notify you about). For quick status checks (questions like "is mdadm reporting any failures?"), you can Google for one that suites your taste, or write a Nagios check yourself to do it. The main advantage of breaking the problem up like this is you can swap out parts of the system when something better comes along. Oh, and for triggering shutdowns from your UPS, try something like Apcupsd[3]. Lindsay [0] http://collectd.org/ [1] http://nagios.org/ [2] http://collectd.org/documentation/manpages/collectd-nagios.1.shtml [3] http://www.apcupsd.com/ -- http://holmwood.id.au/~lindsay/ (me) -- SLUG - Sydney Linux User's Group Mailing List - http://slug.org.au/ Subscription info and FAQs: http://slug.org.au/faq/mailinglists.html
