Hi Ben,

2009/6/22 [email protected] <[email protected]>:
>
> Features:
>  + Email notifications on critical events (that I can specify)
>  + Overview of all systems being monitored showing current status
>
>
> Monitoring:
>
> Critical:
> * status of software RAID6 array (eg. if any drive fails, even if a hot
> spare is available)
> * usage % of various partitions
> * monitor the status of my VMs (I intend to use virtualbox)
> * monitor the status of backups (haven't yet determined what system I'll be
> using)
>
> Desirable:
> * monitor my UPS
>  + trigger shutdowns in VMs and then main system if power goes out.
>
> Future:
> * monitor web logs on servers for hits, usage, etc.
> * monitor security related logs on servers.
>
> Will it be simpler to use multiple tools, or is there some giant swiss army
> knife that it's worth learning?

What you're trying to achieve broadly falls into two categories:

 * data collection
 * notification

I find that most of the monitoring tools out there try to do both, and
don't quite manage to pull it off.

For the data collection, I would recommend using something like
collectd[0]. It can collect stats on disk space, io throughput, ups
usage, web server usage (apache2 + nginx), vm utilisation, and a whole
bunch of other things. It's also network aware, so you can collect
stats on all your machines individually, and aggregate the results in
one place.

For the notification, the easiest option would be Nagios[1]. collectd
provides a collectd-nagios[2] binary which can be used to query stats
that collectd has collected, and return warnings depending on whether
values are out of range (which Nagios will pick up and notify you
about). For quick status checks (questions like "is mdadm reporting
any failures?"), you can Google for one that suites your taste, or
write a Nagios check yourself to do it.

The main advantage of breaking the problem up like this is you can
swap out parts of the system when something better comes along.

Oh, and for triggering shutdowns from your UPS, try something like Apcupsd[3].

Lindsay

[0] http://collectd.org/
[1] http://nagios.org/
[2] http://collectd.org/documentation/manpages/collectd-nagios.1.shtml
[3] http://www.apcupsd.com/

-- 
http://holmwood.id.au/~lindsay/ (me)
--
SLUG - Sydney Linux User's Group Mailing List - http://slug.org.au/
Subscription info and FAQs: http://slug.org.au/faq/mailinglists.html

Reply via email to