Re: [Nagios-users] Large Installation
On 06/10/2010 07:51 PM, Scott Ward wrote: We are looking to do an large installation of Nagios. Is it possible to monitor over 800 machines and over 14000 services? Has anyone tried doing anything like this? If you have how successful was it and how did you configure it? We have plenty of customers with far more than 1000 hosts. 800 should just be a matter of running Nagios on a decently beefy hardware. Don't attempt it with a virtual system though. They have notoriously crappy performance with multi-fork()'ing applications, and if you ever hit the swap, they'll degrade even further. -- Andreas Ericsson andreas.erics...@op5.se OP5 AB www.op5.se Tel: +46 8-230225 Fax: +46 8-230231 Considering the successes of the wars on alcohol, poverty, drugs and terror, I think we should give some serious thought to declaring war on peace. -- ThinkGeek and WIRED's GeekDad team up for the Ultimate GeekDad Father's Day Giveaway. ONE MASSIVE PRIZE to the lucky parental unit. See the prize list and enter to win: http://p.sf.net/sfu/thinkgeek-promo ___ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null
Re: [Nagios-users] Large Installation
We are going to be using distributed monitoring for sure. We just cannot decide whether we should use NDO to write directly to the database or us NSCA to send back to the master server. Any suggestions? Is there a frontend that actually uses the information in an NDO db? From what I've read it looks like the default Nagios front end uses text files. ~Scott Ward On Fri, Jun 11, 2010 at 4:48 AM, Martin Melin nag...@martinmelin.comwrote: On Thu, Jun 10, 2010 at 21:55, Kevin Keane subscript...@kkeane.comwrote: Config file maintenance can be improved to some extent with careful design of the config files, as well as tools. It is an issue that I am running into with a relatively small installation with 80+ hosts and 400+ services. My installation is highly heterogeneous and very dynamic, which makes config file maintenance a nightmare. Having to restart Nagios after a configuration change doesn’t help either. On the other hand, a network with 2000 identical machines is probably going to be much easier to manage than my type of network. Nitpicking or helpful tip, you decide: Nagios reloads config changes on SIGHUP, you don't have to do a restart. A full restart can take a while on a sufficiently sized installation so having to do one for every change would indeed be a PITA, but I've never seen a reload take more than a few seconds. Cheers Martin -- ThinkGeek and WIRED's GeekDad team up for the Ultimate GeekDad Father's Day Giveaway. ONE MASSIVE PRIZE to the lucky parental unit. See the prize list and enter to win: http://p.sf.net/sfu/thinkgeek-promo ___ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null -- ThinkGeek and WIRED's GeekDad team up for the Ultimate GeekDad Father's Day Giveaway. ONE MASSIVE PRIZE to the lucky parental unit. See the prize list and enter to win: http://p.sf.net/sfu/thinkgeek-promo___ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null
Re: [Nagios-users] Large Installation
On 06/11/2010 03:04 PM, Scott Ward wrote: We are going to be using distributed monitoring for sure. We just cannot decide whether we should use NDO to write directly to the database or us NSCA to send back to the master server. Any suggestions? Is there a frontend that actually uses the information in an NDO db? From what I've read it looks like the default Nagios front end uses text files. Unless you desperately need performance data from satellite systems handled properly, I'd invite you to give Merlin and Ninja a try. -- Andreas Ericsson andreas.erics...@op5.se OP5 AB www.op5.se Tel: +46 8-230225 Fax: +46 8-230231 Considering the successes of the wars on alcohol, poverty, drugs and terror, I think we should give some serious thought to declaring war on peace. -- ThinkGeek and WIRED's GeekDad team up for the Ultimate GeekDad Father's Day Giveaway. ONE MASSIVE PRIZE to the lucky parental unit. See the prize list and enter to win: http://p.sf.net/sfu/thinkgeek-promo ___ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null
Re: [Nagios-users] Large Installation
Hi Scott, You can also try Centreon software to manage your different pollers and configuration: http://www.centreon.com Here is an overview of the functioning: http://en.doc.centreon.com/CentreonArchitecture To see how it looks like, here is a web demo: http://demo.centreon.com Best regards. -- Romain LE MERLUS rlemer...@merethis.com Tel. +33 (0)1 49 69 97 12 Mob. +33(0)6 85 05 02 82 -- ThinkGeek and WIRED's GeekDad team up for the Ultimate GeekDad Father's Day Giveaway. ONE MASSIVE PRIZE to the lucky parental unit. See the prize list and enter to win: http://p.sf.net/sfu/thinkgeek-promo___ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null
Re: [Nagios-users] Large Installation
On 06/11/2010 04:08 PM, Scott Ward wrote: How does Merlin compare with NDO in terms of resource usage? merlin is fairly lightweight. What little memory its uses resides primarily on the stack and fits well inside the stack of 1MiB. Here's the output of ps wwaux | grep merlin on a master system with two connected pollers. As you can see, grep consumes more memory than the merlin daemon does. This is with debug symbols compiled in btw, so it will be roughly half that when it's built for production. root 12286 0.0 0.2 61116 660 pts/0R+ 17:29 0:00 grep -i merlin root 23236 0.0 0.7 50572 1856 ?S13:56 0:01 /opt/monitor/op5/merlin/merlind -c /opt/monitor/op5/merlin/merlin.conf As for CPU usage, it's definitely more lightweight than NDO. A typical merlin daemon will basically idle away most of its time. It's the database that does the heavy lifting after all, so it's not that hard to make merlin itself lean and extremely quick. As for storage-space, it doesn't use nearly as much as ndoutils does, since we don't store the entire log and all status updates in the database, but only the current status and statechanges, where a statechange is defined as either the state has changed, or the object went from soft to hard state, which is basically all we need to make reports look good. Since the logfiles are already partitioned by date, it was deemed a lot easier to write a super-fast parser for those instead and make that parser able to display html output. This is the helper we use in ninja, and it's working extremely well, showing interesting logdata in a matter of seconds. It will grow over time ofcourse, but while NDOUtils' database can grow to tens of gigabytes in a matter of months for a large network, merlin stores about 500MiB for a whole year for the same size network. -- Andreas Ericsson andreas.erics...@op5.se OP5 AB www.op5.se Tel: +46 8-230225 Fax: +46 8-230231 Considering the successes of the wars on alcohol, poverty, drugs and terror, I think we should give some serious thought to declaring war on peace. -- ThinkGeek and WIRED's GeekDad team up for the Ultimate GeekDad Father's Day Giveaway. ONE MASSIVE PRIZE to the lucky parental unit. See the prize list and enter to win: http://p.sf.net/sfu/thinkgeek-promo ___ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null
Re: [Nagios-users] Large Installation
I can attest / confirm what Andreas states about the merlin daemon. BTW, Andreas, I just patched our code base to contain your 0.6.7 changes and I will be posting that on Github for you and anyone else interested to check out over the weekend. Our tests so far are showing that with the Merlin NEB and daemon on a poller we lose less than 10% capacity on the poller compared to the poller without the NEB module and Merlind - our test poller is running 10k active services checks and 1k active host checks in less than 5 minutes with polling headroom to spare. - Max -- ThinkGeek and WIRED's GeekDad team up for the Ultimate GeekDad Father's Day Giveaway. ONE MASSIVE PRIZE to the lucky parental unit. See the prize list and enter to win: http://p.sf.net/sfu/thinkgeek-promo___ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null
Re: [Nagios-users] Large Installation
Our changes to Merlin allow N pollers to all write to the same database without conflicts. -- ThinkGeek and WIRED's GeekDad team up for the Ultimate GeekDad Father's Day Giveaway. ONE MASSIVE PRIZE to the lucky parental unit. See the prize list and enter to win: http://p.sf.net/sfu/thinkgeek-promo___ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null
Re: [Nagios-users] Large Installation
If you aren't actually using the data from NDO, there is little point in creating the DB. I would probably not use NDO to write directly from the satellites. Here is why: - Double the network traffic. The satellites have to send check results AND database writes. - Less reliable. How would you keep the master server from writing the same information to the DB that a satellite has just written, and messing up the data? - NDO can be a serious performance bottleneck; you wouldn't want your satellites to be a potential point of failure in terms of performance. - If the satellites are behind a firewall, it may not even be possible to write directly to the DB. From: Scott Ward [mailto:13.sward...@gmail.com] Sent: Friday, June 11, 2010 6:05 AM To: Nagios Users List Subject: Re: [Nagios-users] Large Installation We are going to be using distributed monitoring for sure. We just cannot decide whether we should use NDO to write directly to the database or us NSCA to send back to the master server. Any suggestions? Is there a frontend that actually uses the information in an NDO db? From what I've read it looks like the default Nagios front end uses text files. ~Scott Ward On Fri, Jun 11, 2010 at 4:48 AM, Martin Melin nag...@martinmelin.commailto:nag...@martinmelin.com wrote: On Thu, Jun 10, 2010 at 21:55, Kevin Keane subscript...@kkeane.commailto:subscript...@kkeane.com wrote: Config file maintenance can be improved to some extent with careful design of the config files, as well as tools. It is an issue that I am running into with a relatively small installation with 80+ hosts and 400+ services. My installation is highly heterogeneous and very dynamic, which makes config file maintenance a nightmare. Having to restart Nagios after a configuration change doesn't help either. On the other hand, a network with 2000 identical machines is probably going to be much easier to manage than my type of network. Nitpicking or helpful tip, you decide: Nagios reloads config changes on SIGHUP, you don't have to do a restart. A full restart can take a while on a sufficiently sized installation so having to do one for every change would indeed be a PITA, but I've never seen a reload take more than a few seconds. Cheers Martin -- ThinkGeek and WIRED's GeekDad team up for the Ultimate GeekDad Father's Day Giveaway. ONE MASSIVE PRIZE to the lucky parental unit. See the prize list and enter to win: http://p.sf.net/sfu/thinkgeek-promo ___ Nagios-users mailing list Nagios-users@lists.sourceforge.netmailto:Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null -- ThinkGeek and WIRED's GeekDad team up for the Ultimate GeekDad Father's Day Giveaway. ONE MASSIVE PRIZE to the lucky parental unit. See the prize list and enter to win: http://p.sf.net/sfu/thinkgeek-promo___ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null
[Nagios-users] Large Installation
We are looking to do an large installation of Nagios. Is it possible to monitor over 800 machines and over 14000 services? Has anyone tried doing anything like this? If you have how successful was it and how did you configure it? ~Rultax -- ThinkGeek and WIRED's GeekDad team up for the Ultimate GeekDad Father's Day Giveaway. ONE MASSIVE PRIZE to the lucky parental unit. See the prize list and enter to win: http://p.sf.net/sfu/thinkgeek-promo___ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null
Re: [Nagios-users] Large Installation
We are looking to do an large installation of Nagios. Is it possible to monitor over 800 machines and over 14000 services? Works like a charm :-) Has anyone tried doing anything like this? If you have how successful was it and how did you configure it? Same as for a small installation of NAGIOS M. -- ThinkGeek and WIRED's GeekDad team up for the Ultimate GeekDad Father's Day Giveaway. ONE MASSIVE PRIZE to the lucky parental unit. See the prize list and enter to win: http://p.sf.net/sfu/thinkgeek-promo ___ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null
Re: [Nagios-users] Large Installation
Nagios does have some scalability issues, but for the most part you won't run into them until you get to truly huge installations. I can see three main scalability issues: config file maintenance and the need for one central server, and firewall issues. Config file maintenance can be improved to some extent with careful design of the config files, as well as tools. It is an issue that I am running into with a relatively small installation with 80+ hosts and 400+ services. My installation is highly heterogeneous and very dynamic, which makes config file maintenance a nightmare. Having to restart Nagios after a configuration change doesn't help either. On the other hand, a network with 2000 identical machines is probably going to be much easier to manage than my type of network. The central server is an obvious bottleneck. No matter how powerful the machine and the network connection, there are only so many checks results it can handle. Fortunately, Nagios doesn't require much horsepower. Distributed monitoring helps with this issue because the most expensive part of Nagios is running active checks. With distributed monitoring, the active checks can run on multiple smaller boxes, and then send the check results back as passive checks. Of course distributed monitoring compounds the config file maintenance issue, because you have to configure each check multiple times. The third issue is not directly a scalability issue. Nagios is built with the assumption of a local and mostly trusted network. It's non-trivial to securely get checks to work on remote machines without pretty gaping poking holes into firewalls, and/or frequently establishing and tearing down encrypted connections with the attendant processing load. There are some third-party solutions for this issue, though. From: Scott Ward [mailto:13.sward...@gmail.com] Sent: Thursday, June 10, 2010 12:34 PM To: Nagios Users List Subject: Re: [Nagios-users] Large Installation Make sure to read these pages: http://nagios.sourceforge.net/docs/3_0/tuning.htmlhttp://nagios.sourceforge.net/docs/3_0/tuning.html http://nagios.sourceforge.net/docs/3_0/largeinstalltweaks.htmlhttp://nagios.sourceforge.net/docs/3_0/largeinstalltweaks.html Also, if you're monitoring 800 machines across WANs, you might look into distributed monitoring: http://nagios.sourceforge.net/docs/3_0/distributed.htmlhttp://nagios.sourceforge.net/docs/3_0/distributed.html Let us know how it goes! Thanks for the links. So the distributive monitoring provided by the Nagios docs can handle what we're trying to do? I have read in a few places that Nagios has scalability issues. --Matt BTW, what are you using for your config maintenance? We haven't decided yet. Do you have any recommendations? ~S On Thu, Jun 10, 2010 at 2:23 PM, Matt Simmons standalone.sysad...@gmail.commailto:standalone.sysad...@gmail.com wrote: Make sure to read these pages: http://nagios.sourceforge.net/docs/3_0/tuning.html http://nagios.sourceforge.net/docs/3_0/largeinstalltweaks.html Also, if you're monitoring 800 machines across WANs, you might look into distributed monitoring: http://nagios.sourceforge.net/docs/3_0/distributed.html Let us know how it goes! --Matt BTW, what are you using for your config maintenance? On Thu, Jun 10, 2010 at 1:51 PM, Scott Ward 13.sward...@gmail.commailto:13.sward...@gmail.com wrote: We are looking to do an large installation of Nagios. Is it possible to monitor over 800 machines and over 14000 services? Has anyone tried doing anything like this? If you have how successful was it and how did you configure it? ~Rultax -- ThinkGeek and WIRED's GeekDad team up for the Ultimate GeekDad Father's Day Giveaway. ONE MASSIVE PRIZE to the lucky parental unit. See the prize list and enter to win: http://p.sf.net/sfu/thinkgeek-promo ___ Nagios-users mailing list Nagios-users@lists.sourceforge.netmailto:Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null -- LITTLE GIRL: But which cookie will you eat FIRST? COOKIE MONSTER: Me think you have misconception of cookie-eating process. -- ThinkGeek and WIRED's GeekDad team up for the Ultimate GeekDad Father's Day Giveaway. ONE MASSIVE PRIZE to the lucky parental unit. See the prize list and enter to win: http://p.sf.net/sfu/thinkgeek-promo ___ Nagios-users mailing list Nagios-users@lists.sourceforge.netmailto:Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS
Re: [Nagios-users] Large Installation
Make sure to read these pages: http://nagios.sourceforge.net/docs/3_0/tuning.html http://nagios.sourceforge.net/docs/3_0/largeinstalltweaks.html Also, if you're monitoring 800 machines across WANs, you might look into distributed monitoring: http://nagios.sourceforge.net/docs/3_0/distributed.html Let us know how it goes! --Matt BTW, what are you using for your config maintenance? On Thu, Jun 10, 2010 at 1:51 PM, Scott Ward 13.sward...@gmail.com wrote: We are looking to do an large installation of Nagios. Is it possible to monitor over 800 machines and over 14000 services? Has anyone tried doing anything like this? If you have how successful was it and how did you configure it? ~Rultax -- ThinkGeek and WIRED's GeekDad team up for the Ultimate GeekDad Father's Day Giveaway. ONE MASSIVE PRIZE to the lucky parental unit. See the prize list and enter to win: http://p.sf.net/sfu/thinkgeek-promo ___ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null -- LITTLE GIRL: But which cookie will you eat FIRST? COOKIE MONSTER: Me think you have misconception of cookie-eating process. -- ThinkGeek and WIRED's GeekDad team up for the Ultimate GeekDad Father's Day Giveaway. ONE MASSIVE PRIZE to the lucky parental unit. See the prize list and enter to win: http://p.sf.net/sfu/thinkgeek-promo ___ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null
Re: [Nagios-users] Large Installation
I can't say that I've solved the scalability problem, but I I don't have it, just because I've implemented a policy such that I never check any server over a WAN link, with the exception of another Nagios server (plus both ends of all of the WAN links themselves). This does require one Nagios server per site, but to me, that's an appealing idea anyway, because I don't have a single point of failure. Any of my Nagios installations could die completely, and I'd be alerted by the others, just like any one internet connection could die, and I'd still get alerts about it. In the event of a weird failure, I can pretty much construct the network diagram based on which links are reporting up, and from where. It does require a certain amount of configuration overhead, but most of that is done with templating anyway. I don't have my system laid out exactly like I want, but I'm implementing version control (subversion, in my case) and I have a different Nagios repository for each site. If I had more templates (or more shared configuration files), I would probably have a 'nagios-shared' repository, so I wouldn't have to replicate everything manually. As for the arrangement of my configs, it mostly follows this howto that I did a year ago: http://www.standalone-sysadmin.com/blog/2009/07/nagios-config/ Hope it can help someone --Matt On Thu, Jun 10, 2010 at 3:55 PM, Kevin Keane subscript...@kkeane.com wrote: Nagios does have some scalability issues, but for the most part you won’t run into them until you get to truly huge installations. I can see three main scalability issues: config file maintenance and the need for one central server, and firewall issues. Config file maintenance can be improved to some extent with careful design of the config files, as well as tools. It is an issue that I am running into with a relatively small installation with 80+ hosts and 400+ services. My installation is highly heterogeneous and very dynamic, which makes config file maintenance a nightmare. Having to restart Nagios after a configuration change doesn’t help either. On the other hand, a network with 2000 identical machines is probably going to be much easier to manage than my type of network. The central server is an obvious bottleneck. No matter how powerful the machine and the network connection, there are only so many checks results it can handle. Fortunately, Nagios doesn’t require much horsepower. Distributed monitoring helps with this issue because the most expensive part of Nagios is running active checks. With distributed monitoring, the active checks can run on multiple smaller boxes, and then send the check results back as passive checks. Of course distributed monitoring compounds the config file maintenance issue, because you have to configure each check multiple times. The third issue is not directly a scalability issue. Nagios is built with the assumption of a local and mostly trusted network. It’s non-trivial to securely get checks to work on remote machines without pretty gaping poking holes into firewalls, and/or frequently establishing and tearing down encrypted connections with the attendant processing load. There are some third-party solutions for this issue, though. From: Scott Ward [mailto:13.sward...@gmail.com] Sent: Thursday, June 10, 2010 12:34 PM To: Nagios Users List Subject: Re: [Nagios-users] Large Installation Make sure to read these pages: http://nagios.sourceforge.net/docs/3_0/tuning.html http://nagios.sourceforge.net/docs/3_0/largeinstalltweaks.html Also, if you're monitoring 800 machines across WANs, you might look into distributed monitoring: http://nagios.sourceforge.net/docs/3_0/distributed.html Let us know how it goes! Thanks for the links. So the distributive monitoring provided by the Nagios docs can handle what we're trying to do? I have read in a few places that Nagios has scalability issues. --Matt BTW, what are you using for your config maintenance? We haven't decided yet. Do you have any recommendations? ~S On Thu, Jun 10, 2010 at 2:23 PM, Matt Simmons standalone.sysad...@gmail.com wrote: Make sure to read these pages: http://nagios.sourceforge.net/docs/3_0/tuning.html http://nagios.sourceforge.net/docs/3_0/largeinstalltweaks.html Also, if you're monitoring 800 machines across WANs, you might look into distributed monitoring: http://nagios.sourceforge.net/docs/3_0/distributed.html Let us know how it goes! --Matt BTW, what are you using for your config maintenance? On Thu, Jun 10, 2010 at 1:51 PM, Scott Ward 13.sward...@gmail.com wrote: We are looking to do an large installation of Nagios. Is it possible to monitor over 800 machines and over 14000 services? Has anyone tried doing anything like this? If you have how successful was it and how did you configure it? ~Rultax -- ThinkGeek and WIRED's GeekDad