I recommend apcupsd because i've used it, but I notice collectd has a
NUT plugin, so knock yourself out with whatever. :-)

Lindsay

2009/6/22 Dean Hamstead <[email protected]>:
> I would have to recommend NUT over apcupsd.
>
> Dean
>
> On 6/22/2009, "Ben" <[email protected]> wrote:
>
>>Hi Lindsay,
>>
>>Thanks for that comprehensive answer.
>>
>>So collectd runs on each system itself, but I assume Nagios is centralised
>>at some point, so where would be the most sensible place to do that? Is
>>there ultra reliable hosting built for just that purpose?
>>
>>
>>
>>2009/6/22 Lindsay Holmwood <[email protected]>
>>
>>> Hi Ben,
>>>
>>> 2009/6/22 [email protected] <[email protected]>:
>>> >
>>> > Features:
>>> >  + Email notifications on critical events (that I can specify)
>>> >  + Overview of all systems being monitored showing current status
>>> >
>>> >
>>> > Monitoring:
>>> >
>>> > Critical:
>>> > * status of software RAID6 array (eg. if any drive fails, even if a hot
>>> > spare is available)
>>> > * usage % of various partitions
>>> > * monitor the status of my VMs (I intend to use virtualbox)
>>> > * monitor the status of backups (haven't yet determined what system I'll
>>> be
>>> > using)
>>> >
>>> > Desirable:
>>> > * monitor my UPS
>>> >  + trigger shutdowns in VMs and then main system if power goes out.
>>> >
>>> > Future:
>>> > * monitor web logs on servers for hits, usage, etc.
>>> > * monitor security related logs on servers.
>>> >
>>> > Will it be simpler to use multiple tools, or is there some giant swiss
>>> army
>>> > knife that it's worth learning?
>>>
>>> What you're trying to achieve broadly falls into two categories:
>>>
>>>  * data collection
>>>  * notification
>>>
>>> I find that most of the monitoring tools out there try to do both, and
>>> don't quite manage to pull it off.
>>>
>>> For the data collection, I would recommend using something like
>>> collectd[0]. It can collect stats on disk space, io throughput, ups
>>> usage, web server usage (apache2 + nginx), vm utilisation, and a whole
>>> bunch of other things. It's also network aware, so you can collect
>>> stats on all your machines individually, and aggregate the results in
>>> one place.
>>>
>>> For the notification, the easiest option would be Nagios[1]. collectd
>>> provides a collectd-nagios[2] binary which can be used to query stats
>>> that collectd has collected, and return warnings depending on whether
>>> values are out of range (which Nagios will pick up and notify you
>>> about). For quick status checks (questions like "is mdadm reporting
>>> any failures?"), you can Google for one that suites your taste, or
>>> write a Nagios check yourself to do it.
>>>
>>> The main advantage of breaking the problem up like this is you can
>>> swap out parts of the system when something better comes along.
>>>
>>> Oh, and for triggering shutdowns from your UPS, try something like
>>> Apcupsd[3].
>>>
>>> Lindsay
>>>
>>> [0] http://collectd.org/
>>> [1] http://nagios.org/
>>> [2] http://collectd.org/documentation/manpages/collectd-nagios.1.shtml
>>> [3] http://www.apcupsd.com/
>>>
>>> --
>>> http://holmwood.id.au/~lindsay/ <http://holmwood.id.au/%7Elindsay/> (me)
>>> --
>>> SLUG - Sydney Linux User's Group Mailing List - http://slug.org.au/
>>> Subscription info and FAQs: http://slug.org.au/faq/mailinglists.html
>>>
>>--
>>SLUG - Sydney Linux User's Group Mailing List - http://slug.org.au/
>>Subscription info and FAQs: http://slug.org.au/faq/mailinglists.html
>



-- 
http://holmwood.id.au/~lindsay/ (me)
--
SLUG - Sydney Linux User's Group Mailing List - http://slug.org.au/
Subscription info and FAQs: http://slug.org.au/faq/mailinglists.html

Reply via email to