I recommend apcupsd because i've used it, but I notice collectd has a NUT plugin, so knock yourself out with whatever. :-)
Lindsay 2009/6/22 Dean Hamstead <[email protected]>: > I would have to recommend NUT over apcupsd. > > Dean > > On 6/22/2009, "Ben" <[email protected]> wrote: > >>Hi Lindsay, >> >>Thanks for that comprehensive answer. >> >>So collectd runs on each system itself, but I assume Nagios is centralised >>at some point, so where would be the most sensible place to do that? Is >>there ultra reliable hosting built for just that purpose? >> >> >> >>2009/6/22 Lindsay Holmwood <[email protected]> >> >>> Hi Ben, >>> >>> 2009/6/22 [email protected] <[email protected]>: >>> > >>> > Features: >>> > + Email notifications on critical events (that I can specify) >>> > + Overview of all systems being monitored showing current status >>> > >>> > >>> > Monitoring: >>> > >>> > Critical: >>> > * status of software RAID6 array (eg. if any drive fails, even if a hot >>> > spare is available) >>> > * usage % of various partitions >>> > * monitor the status of my VMs (I intend to use virtualbox) >>> > * monitor the status of backups (haven't yet determined what system I'll >>> be >>> > using) >>> > >>> > Desirable: >>> > * monitor my UPS >>> > + trigger shutdowns in VMs and then main system if power goes out. >>> > >>> > Future: >>> > * monitor web logs on servers for hits, usage, etc. >>> > * monitor security related logs on servers. >>> > >>> > Will it be simpler to use multiple tools, or is there some giant swiss >>> army >>> > knife that it's worth learning? >>> >>> What you're trying to achieve broadly falls into two categories: >>> >>> * data collection >>> * notification >>> >>> I find that most of the monitoring tools out there try to do both, and >>> don't quite manage to pull it off. >>> >>> For the data collection, I would recommend using something like >>> collectd[0]. It can collect stats on disk space, io throughput, ups >>> usage, web server usage (apache2 + nginx), vm utilisation, and a whole >>> bunch of other things. It's also network aware, so you can collect >>> stats on all your machines individually, and aggregate the results in >>> one place. >>> >>> For the notification, the easiest option would be Nagios[1]. collectd >>> provides a collectd-nagios[2] binary which can be used to query stats >>> that collectd has collected, and return warnings depending on whether >>> values are out of range (which Nagios will pick up and notify you >>> about). For quick status checks (questions like "is mdadm reporting >>> any failures?"), you can Google for one that suites your taste, or >>> write a Nagios check yourself to do it. >>> >>> The main advantage of breaking the problem up like this is you can >>> swap out parts of the system when something better comes along. >>> >>> Oh, and for triggering shutdowns from your UPS, try something like >>> Apcupsd[3]. >>> >>> Lindsay >>> >>> [0] http://collectd.org/ >>> [1] http://nagios.org/ >>> [2] http://collectd.org/documentation/manpages/collectd-nagios.1.shtml >>> [3] http://www.apcupsd.com/ >>> >>> -- >>> http://holmwood.id.au/~lindsay/ <http://holmwood.id.au/%7Elindsay/> (me) >>> -- >>> SLUG - Sydney Linux User's Group Mailing List - http://slug.org.au/ >>> Subscription info and FAQs: http://slug.org.au/faq/mailinglists.html >>> >>-- >>SLUG - Sydney Linux User's Group Mailing List - http://slug.org.au/ >>Subscription info and FAQs: http://slug.org.au/faq/mailinglists.html > -- http://holmwood.id.au/~lindsay/ (me) -- SLUG - Sydney Linux User's Group Mailing List - http://slug.org.au/ Subscription info and FAQs: http://slug.org.au/faq/mailinglists.html
