Re: [Nagios-users] High Latency with service checks
Yu Watanabe wrote: Hello All. I would like to get an advice with nagios latency issue. In nagios 3.0.6 with RHEL 4, would there be a possibility that service check latency hikes even though the sar or iostat usage is not in relatively high load usage? (I am planning to upgrade to v 3.2.3 soon.) SAR average cpu usage were 40% and iowait was lying 0%. Swaping were not occuring. I have more than 1000 ping checks and average latency are 20 min. Check execution time for ping is all below 1 sec. Plugin I am using is check_icmp. CPU I am using is , Intel(R) Xeon(TM) CPU 2.80GHz Memory is 8GB OS is RHEL 4.4 Are there any possiblity that Nagios gets locked up with the service check scheduling? Thanks, Yu have you tried throttling the amount of concurrent checks ? could it be that the 1000 pings is flooding your network ? I've encountered a similar issue with a a setup I had ( granted the latency wasn't that extreme ) , and from what you describe the symptoms sound the same . We tried several solutions (DNX , Mod_gearman) to reduce the latency , the solution that worked int he end was adding extra RAM to the machine and that solved it - I know that is not the best method , but non of the regular methods of tweaking nagios (large install , ramfs etc` ) worked , the boost in ram reduced the latency from 6+ minutes to 3 sec. Assaf -- The ultimate all-in-one performance toolkit: Intel(R) Parallel Studio XE: Pinpoint memory and threading errors before they happen. Find and fix more than 250 security defects in the development cycle. Locate bottlenecks in serial and parallel code that limit performance. http://p.sf.net/sfu/intel-dev2devfeb ___ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null
Re: [Nagios-users] High Latency with service checks
another possibility, do you use addon to do task after every checking? for example, update RRD files for performance graphing, sometimes that should increase the checking latency. 2011/2/16 Yu Watanabe yu.watan...@jp.fujitsu.com: Thank you very much for the reply. Assaf Flatto さんは書きました: Yu Watanabe wrote: Hello All. I would like to get an advice with nagios latency issue. In nagios 3.0.6 with RHEL 4, would there be a possibility that service check latency hikes even though the sar or iostat usage is not in relatively high load usage? (I am planning to upgrade to v 3.2.3 soon.) SAR average cpu usage were 40% and iowait was lying 0%. Swaping were not occuring. I have more than 1000 ping checks and average latency are 20 min. Check execution time for ping is all below 1 sec. Plugin I am using is check_icmp. CPU I am using is , Intel(R) Xeon(TM) CPU 2.80GHz Memory is 8GB OS is RHEL 4.4 Are there any possiblity that Nagios gets locked up with the service check scheduling? Thanks, Yu have you tried throttling the amount of concurrent checks ? could it be that the 1000 pings is flooding your network ? I am not sure that I understand the meaning by throttling... Would there be a parameter in Nagios to control this? For flooding I will check the netstat -s periodically and see if there are too much. I've encountered a similar issue with a a setup I had ( granted the latency wasn't that extreme ) , and from what you describe the symptoms sound the same . We tried several solutions (DNX , Mod_gearman) to reduce the latency , the solution that worked int he end was adding extra RAM to the machine and that solved it - I know that is not the best method , but non of the regular methods of tweaking nagios (large install , ramfs etc` ) worked , the boost in ram reduced the latency from 6+ minutes to 3 sec. As long as I see the vmstat , there seems to be enough memories left for buffer and cache since there aren't any swaping. Thanks , Yu Assaf -- The ultimate all-in-one performance toolkit: Intel(R) Parallel Studio XE: Pinpoint memory and threading errors before they happen. Find and fix more than 250 security defects in the development cycle. Locate bottlenecks in serial and parallel code that limit performance. http://p.sf.net/sfu/intel-dev2devfeb ___ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null -- The ultimate all-in-one performance toolkit: Intel(R) Parallel Studio XE: Pinpoint memory and threading errors before they happen. Find and fix more than 250 security defects in the development cycle. Locate bottlenecks in serial and parallel code that limit performance. http://p.sf.net/sfu/intel-dev2devfeb ___ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null -- The ultimate all-in-one performance toolkit: Intel(R) Parallel Studio XE: Pinpoint memory and threading errors before they happen. Find and fix more than 250 security defects in the development cycle. Locate bottlenecks in serial and parallel code that limit performance. http://p.sf.net/sfu/intel-dev2devfeb ___ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null
Re: [Nagios-users] High Latency with service checks
Thank you for the reply Yueh. We graph the results but not after every check. I use the performance output to pool it to a file first . Thanks , Yu Yueh-Hung Liu さんは書きました: another possibility, do you use addon to do task after every checking? for example, update RRD files for performance graphing, sometimes that should increase the checking latency. 2011/2/16 Yu Watanabe yu.watan...@jp.fujitsu.com: Thank you very much for the reply. Assaf Flatto さんは書きました: Yu Watanabe wrote: Hello All. I would like to get an advice with nagios latency issue. In nagios 3.0.6 with RHEL 4, would there be a possibility that service check latency hikes even though the sar or iostat usage is not in relatively high load usage? (I am planning to upgrade to v 3.2.3 soon.) SAR average cpu usage were 40% and iowait was lying 0%. Swaping were not occuring. I have more than 1000 ping checks and average latency are 20 min. Check execution time for ping is all below 1 sec. Plugin I am using is check_icmp. CPU I am using is , Intel(R) Xeon(TM) CPU 2.80GHz Memory is 8GB OS is RHEL 4.4 Are there any possiblity that Nagios gets locked up with the service check scheduling? Thanks, Yu have you tried throttling the amount of concurrent checks ? could it be that the 1000 pings is flooding your network ? ?I am not sure that I understand the meaning by throttling... ?Would there be a parameter in Nagios to control this? ?For flooding I will check the netstat -s periodically and see if there are too much. I've encountered a similar issue with a ?a setup I had ( granted the latency wasn't that extreme ) , and from what you describe the symptoms sound the same . We tried several solutions (DNX , Mod_gearman) to reduce the latency , the solution that worked int he end was adding extra RAM to the machine and that solved it - I know that is not the best method , but non of the regular methods of tweaking nagios (large install , ramfs etc` ) worked , the boost in ram reduced the latency from 6+ minutes to 3 sec. ?As long as I see the vmstat , there seems to be enough memories left ?for buffer and cache since there aren't any swaping. Thanks , Yu Assaf -- The ultimate all-in-one performance toolkit: Intel(R) Parallel Studio XE: Pinpoint memory and threading errors before they happen. Find and fix more than 250 security defects in the development cycle. Locate bottlenecks in serial and parallel code that limit performance. http://p.sf.net/sfu/intel-dev2devfeb ___ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null -- The ultimate all-in-one performance toolkit: Intel(R) Parallel Studio XE: Pinpoint memory and threading errors before they happen. Find and fix more than 250 security defects in the development cycle. Locate bottlenecks in serial and parallel code that limit performance. http://p.sf.net/sfu/intel-dev2devfeb ___ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null -- The ultimate all-in-one performance toolkit: Intel(R) Parallel Studio XE: Pinpoint memory and threading errors before they happen. Find and fix more than 250 security defects in the development cycle. Locate bottlenecks in serial and parallel code that limit performance. http://p.sf.net/sfu/intel-dev2devfeb ___ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null -- The ultimate all-in-one performance toolkit: Intel(R) Parallel Studio XE: Pinpoint memory and threading errors before they happen. Find and fix more than 250 security defects in the development cycle. Locate bottlenecks in serial and parallel code that limit performance. http://p.sf.net/sfu/intel-dev2devfeb ___ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when
Re: [Nagios-users] high latency
On 12/11/2010 07:14 PM, Frost, Mark {PBC} wrote: -Original Message- From: Andreas Ericsson [mailto:a...@op5.se] Sent: Tuesday, December 07, 2010 5:57 PM To: Frost, Mark {PBC} Cc: Nagios Users List Subject: Re: [Nagios-users] high latency Any chance that the OP5 site will eventually be configured to allow git through a proxy? It's of course less convenient to use snapshot tarballs, but still workable, of course. You mean through http? Doesn't it already? I think it's supposed to. I can check up on that later. The gitweb page has links for grabbing latest master as a tarball though. That might work as an interim solution. -- Andreas Ericsson andreas.erics...@op5.se OP5 AB www.op5.se Tel: +46 8-230225 Fax: +46 8-230231 Andreas, It's just never worked for me and I thought you'd mentioned some time ago that OP5's git site just didn't support it. I've validated that my version of git (1.7.1) will grab code from a public site via our corporate proxy using other public code (the proxy is setup via the $http_proxy environment variable): $ git clone http://github.com/schacon/grack.git Initialized empty Git repository in /home/mfrost0/src/grack/.git/ remote: Counting objects: 85, done. remote: Compressing objects: 100% (45/45), done. remote: Total 85 (delta 32), reused 80 (delta 31) Unpacking objects: 100% (85/85), done. but... $ git clone http://git.op5.org/nagios/merlin.git merlin-src Initialized empty Git repository in /home/mfrost0/src/merlin-src/.git/ fatal: http://git.op5.org/nagios/merlin.git/info/refs not found: did you run git update-server-info on the server? $ git clone http://git.op5.org/nagios.git nagios-src Initialized empty Git repository in /home/mfrost0/src/nagios-src/.git/ fatal: http://git.op5.org/nagios.git/info/refs not found: did you run git update-server-info on the server? so, you know :-( Aight. I'll look into it tomorrow when I get to work. It's supposed to work anyways. -- Andreas Ericsson andreas.erics...@op5.se OP5 AB www.op5.se Tel: +46 8-230225 Fax: +46 8-230231 Considering the successes of the wars on alcohol, poverty, drugs and terror, I think we should give some serious thought to declaring war on peace. -- Oracle to DB2 Conversion Guide: Learn learn about native support for PL/SQL, new data types, scalar functions, improved concurrency, built-in packages, OCI, SQL*Plus, data movement tools, best practices and more. http://p.sf.net/sfu/oracle-sfdev2dev ___ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null
Re: [Nagios-users] high latency
-Original Message- From: Andreas Ericsson [mailto:a...@op5.se] Sent: Tuesday, December 07, 2010 5:57 PM To: Frost, Mark {PBC} Cc: Nagios Users List Subject: Re: [Nagios-users] high latency Any chance that the OP5 site will eventually be configured to allow git through a proxy? It's of course less convenient to use snapshot tarballs, but still workable, of course. You mean through http? Doesn't it already? I think it's supposed to. I can check up on that later. The gitweb page has links for grabbing latest master as a tarball though. That might work as an interim solution. -- Andreas Ericsson andreas.erics...@op5.se OP5 AB www.op5.se Tel: +46 8-230225 Fax: +46 8-230231 Andreas, It's just never worked for me and I thought you'd mentioned some time ago that OP5's git site just didn't support it. I've validated that my version of git (1.7.1) will grab code from a public site via our corporate proxy using other public code (the proxy is setup via the $http_proxy environment variable): $ git clone http://github.com/schacon/grack.git Initialized empty Git repository in /home/mfrost0/src/grack/.git/ remote: Counting objects: 85, done. remote: Compressing objects: 100% (45/45), done. remote: Total 85 (delta 32), reused 80 (delta 31) Unpacking objects: 100% (85/85), done. but... $ git clone http://git.op5.org/nagios/merlin.git merlin-src Initialized empty Git repository in /home/mfrost0/src/merlin-src/.git/ fatal: http://git.op5.org/nagios/merlin.git/info/refs not found: did you run git update-server-info on the server? $ git clone http://git.op5.org/nagios.git nagios-src Initialized empty Git repository in /home/mfrost0/src/nagios-src/.git/ fatal: http://git.op5.org/nagios.git/info/refs not found: did you run git update-server-info on the server? so, you know :-( Thanks Mark -- Oracle to DB2 Conversion Guide: Learn learn about native support for PL/SQL, new data types, scalar functions, improved concurrency, built-in packages, OCI, SQL*Plus, data movement tools, best practices and more. http://p.sf.net/sfu/oracle-sfdev2dev ___ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null
Re: [Nagios-users] high latency
-Original Message- From: Andreas Ericsson [mailto:a...@op5.se] Sent: Tuesday, December 07, 2010 9:44 AM Hmm. So then I'd be so curious why the 2 distservers which are both using oc[sh]p commands the same way have such radically different latencies. Agreed. There must be other differences too. Perhaps there's trouble resolving from one of the nodes? That usually makes checks run a helluva lot longer than they normally have to. I had another look. While I found a test host that I'd made that was deliberately unreachable, I found that when I removed it it made no difference. Execution times are significantly lower (min/max/avg) on the host with the high latencies than for the one with low latencies. I don't see any unresolvable hosts or now, any unreachable hosts. Puzzling. I've always wished there was an easy way to see which processes had high latencies from the web interface without having to view the status.dat file... Either way, you're suggesting that having a NEB module handle the post-check work will eliminate the serialization. Yes. Sneaking a peak at what's needed in order for an event to get sent to master via an eventbroker compared to running an oc[sh]p command renders this, more or less: [ good stuff snipped...] Wow. In terms of effort, the difference is sort of like either hopping on one leg along the entire great wall of china or walking to the kitchen and grab a beer. parallelize_check is set to 1 everywhere. Does one server have a lot of random service failures? On-demand hostchecks are still run in parallel. I don't think so. Intermittent you mean? Not as far as I know or can see. What version of Nagios are you running? 3.2.1 I take it upgrading makes no difference? To 3.2.3? I'll probably try that on the new servers, but if things work out I may just move to Merlin + 3.2.4. I wasn't sure I saw anything in the 3.2.3 release that I found compelling for us at the time. As I say, this system now has fairly high visibility so just trying something like that would involve a rather painful internal change process. It's like piloting the QE2 -- I can't change course very quickly :-) Thanks, Andreas. I'm hoping to allocate sufficient resources on the new servers to be able to play with Merlin more there. It's quite resource-friendly actually. Well, compared to what you're running now it's positively feather-light. I meant more like installing MySQL everywhere, building filesystems to hold the MySQL data, etc. Not so much like I need more memory or more CPUs. I don't remember seeing anything in the Merlin docs (maybe I missed it), but how large would the MySQL database need to be? Pretty small on each box, right? Like 500MB or less? Will I be able to have the performance data from a poller be sent up to a NOC for digestion by pnp4nagios? Yes, but you'll need the threadsafe version of Nagios you can obtain from either CVS or git://git.op5.org/nagios.git for performance-data to work. Actually, you need that for Merlin to work. That's part of the plan. Any chance that the OP5 site will eventually be configured to allow git through a proxy? It's of course less convenient to use snapshot tarballs, but still workable, of course. It may have been a long time ago, but I thought I remember seeing that performance data was not yet implemented. That was then. This is now :) Spifftacular! No we'd be using some flavor of SLES. Should work marvellously then. Thanks as always for your help, Andreas. Mark -- What happens now with your Lotus Notes apps - do you make another costly upgrade, or settle for being marooned without product support? Time to move off Lotus Notes and onto the cloud with Force.com, apps are easier to build, use, and manage than apps on traditional platforms. Sign up for the Lotus Notes Migration Kit to learn more. http://p.sf.net/sfu/salesforce-d2d ___ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null
Re: [Nagios-users] high latency
On 12/03/2010 07:59 PM, Daniel Wittenberg wrote: It appears that nagios spawns lots and lots of new procs for all the various tasks it does, check results and such. I was curious, wouldn't a model more like Apache work better? Something like, a queue for work, and have worker processes grab off that queue, run a bunch of different jobs, then die, rather than just performing one task? That seems like it would still maintain stability and offer higher performance gains ? It probably would, and it's on the roadmap to rewrite those parts of Nagios to something similar to what you've described. -- Andreas Ericsson andreas.erics...@op5.se OP5 AB www.op5.se Tel: +46 8-230225 Fax: +46 8-230231 Considering the successes of the wars on alcohol, poverty, drugs and terror, I think we should give some serious thought to declaring war on peace. -- What happens now with your Lotus Notes apps - do you make another costly upgrade, or settle for being marooned without product support? Time to move off Lotus Notes and onto the cloud with Force.com, apps are easier to build, use, and manage than apps on traditional platforms. Sign up for the Lotus Notes Migration Kit to learn more. http://p.sf.net/sfu/salesforce-d2d ___ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null
Re: [Nagios-users] high latency
On 12/03/2010 08:14 PM, Frost, Mark {PBC} wrote: Can the use of dependencies also be the cause of increased latencies? If they're very deep, it's possible. Otherwise it really shouldn't matter all that much. It will ofcourse add *some* load, but it shouldn't be enough to cause latency. I too struggle with them and I'm running on lightly-loaded physical hardware. We have 2 servers doing the checks sending back to a central server. Both distributed nodes use ocsp/ochp, but they do nothing more than append results to a file (i.e. it exits quickly). Results are handled outside of Nagios. Try getting rid of the oc[sh]p commands and use Merlin or google for pnsca or persistent nsca. There's one available from op5's repositories that may or may not work, and there's one from somewhere else that they're apparently using to great effect. Even if it exits quickly, it's still executed serially, so checking halts a small period of time for each and every check that runs. What's odd is that distserver 1 and distserver 2 are configured the same distserver1: Hosts Checked 675 Services Checked: 4179 Active Service Latency: 0.000 / 3.155 / 0.382 sec Active Service Execution Time: 0.000 / 60.038 / 0.145 sec distserver2: Hosts Checked: 261 Services Checked: 4289 Active Service Latency: 0.000 / 169.977 / 81.300 sec Active Service Execution Time: 0.000 / 15.270 / 0.211 sec yet as you can see, distserver2's latency is much higher and always has been. I tried turning off EPN yesterday on distserver2 and it had no discernable effect. We added 400 new service checks yesterday on distserver2 (just more of the same checks we already do but on 26 new hosts) and the latency went from 35 to over 80. What kind of checks are you running? Some plugins draw a lot of cpu. Are any of the checks set to run in serial (grep for parallelize_check in your objects.cache file). What version of Nagios are you running? The checks we do are very different (Windows, Linux, Unix, many are app-centric) so it's difficult to compare exactly what runs on distserver1 and distserver2, but given the jump that was taken yesterday, I'm wondering if the fact that the type of checks on these new hosts are all built on dependencies make me wonder if that doesn't have something to do with it. These hosts (Windows) have a basic check for NRPE and all other checks on the host are dependent on the NRPE check succeeding. I have to move to all new Nagios servers very soon. I'm interested in Merlin, but given its non-production nature just yet, I'm hesitant to commit and I'm not sure if it will help me here. It's been running at our 400+ customers with very few problems for the past month. 0.9.1, released just yesterday, solves the known issues our customers have encountered. You might want to take a look at it again. There are some issues on FreeBSD though (was that you reporting them?). I just recently got a new laptop with better support for running virtual systems, so I'm downloading a FreeBSD 8.1 install dvd as we speak. Hopefully I'll have those issues sorted out before the end of the week. -- Andreas Ericsson andreas.erics...@op5.se OP5 AB www.op5.se Tel: +46 8-230225 Fax: +46 8-230231 Considering the successes of the wars on alcohol, poverty, drugs and terror, I think we should give some serious thought to declaring war on peace. -- What happens now with your Lotus Notes apps - do you make another costly upgrade, or settle for being marooned without product support? Time to move off Lotus Notes and onto the cloud with Force.com, apps are easier to build, use, and manage than apps on traditional platforms. Sign up for the Lotus Notes Migration Kit to learn more. http://p.sf.net/sfu/salesforce-d2d ___ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null
Re: [Nagios-users] high latency
I did some testing today with epn on and off and it didn't seem to make any difference in our latency times. Not overly scientific though, but looked about the same running few hours each way. Dan -Original Message- From: Max Schubert [mailto:m...@webwizarddesign.com] Sent: Friday, December 03, 2010 7:03 AM To: Andreas Ericsson; Nagios Users List Subject: Re: [Nagios-users] high latency Latency increases much more quickly for us without epn as execution times are noticably longer per check. We use rhel 5.x, so the perl is 5.8.8. We have semi dailoy updates to our pollers and with epn that means cold restarts - memory leaks have not been noticable given that scenrio, but on test hosts or hosts where we are doing burn ins it is negligable enough that we can go for 2-3 days with no memory issues - we always hit service latency thresholds first. 7 seconds is in general where we have to force a restart of our pollers to prevent metric collection and snmp delta calculation issues. Max On 12/3/10, Andreas Ericsson a...@op5.se wrote: On 12/03/2010 12:46 PM, Max Schubert wrote: I find it interesting that a number of users get performance improvements with embedded perl off - we lose 20-40% polling capacity perl poller with it off. How do you mean that you're losing capacity? Does latency start to creep upwards or is load increasing? Out of interest; How much memory does epn leak nowadays, and which perl version is it compiled against? -- Andreas Ericsson andreas.erics...@op5.se OP5 AB www.op5.se Tel: +46 8-230225 Fax: +46 8-230231 Considering the successes of the wars on alcohol, poverty, drugs and terror, I think we should give some serious thought to declaring war on peace. -- Increase Visibility of Your 3D Game App Earn a Chance To Win $500! Tap into the largest installed PC base get more eyes on your game by optimizing for Intel(R) Graphics Technology. Get started today with the Intel(R) Software Partner Program. Five $500 cash prizes are up for grabs. http://p.sf.net/sfu/intelisp-dev2dev ___ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null -- What happens now with your Lotus Notes apps - do you make another costly upgrade, or settle for being marooned without product support? Time to move off Lotus Notes and onto the cloud with Force.com, apps are easier to build, use, and manage than apps on traditional platforms. Sign up for the Lotus Notes Migration Kit to learn more. http://p.sf.net/sfu/salesforce-d2d ___ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null
Re: [Nagios-users] high latency
On 12/02/2010 06:42 PM, Daniel Wittenberg wrote: Embeded perl is interesting though, I hadn't tried that, thought it was supposed to help with performance. In theory, it does. It probably does in practice too, but the problems associated with it makes it not worth it. I don't think we have any obsessive stuff running right now. Check if you're not sure. Right now hardware is 4 proc vmware esx, 4GB RAM. For production there will be 12 of those boxes with the number of hosts being about 1200-1500 per nagios server. Virtual systems. Bleh. Anyways, if you're going to use a loadbalanced setup you should look into using Merlin. That way you get complete failover for free. -- Andreas Ericsson andreas.erics...@op5.se OP5 AB www.op5.se Tel: +46 8-230225 Fax: +46 8-230231 Considering the successes of the wars on alcohol, poverty, drugs and terror, I think we should give some serious thought to declaring war on peace. -- Increase Visibility of Your 3D Game App Earn a Chance To Win $500! Tap into the largest installed PC base get more eyes on your game by optimizing for Intel(R) Graphics Technology. Get started today with the Intel(R) Software Partner Program. Five $500 cash prizes are up for grabs. http://p.sf.net/sfu/intelisp-dev2dev ___ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null
Re: [Nagios-users] high latency
On 12/02/2010 08:38 PM, Daniel Wittenberg wrote: Someone else noticed that nagios is generating a ton of minor page faults, and curious if that's normal and if that could be causing some of the latency in the checks? define a ton $ /usr/bin/time php -r 'echo marsipulami\n;' marsipulami 0.01user 0.01system 0:00.09elapsed 34%CPU (0avgtext+0avgdata 29104maxresident)k 10208inputs+0outputs (70major+1962minor)pagefaults 0swaps That's with a reasonably simple program, and it generates 70 major and 1962 minor pagefaults. I've also got a tmpfs setup for the status.dat and the checkresults directory to ease some of the disk i/o since we're on a san-backed vm host. That's good, although if you're using a virtual system you'll never know for sure if you're really using a ramdisk or not, since the host system might well use swap to store the ramdisk anyway. I turned off embedded perl this morning and our latency has been holding at 10 seconds so far, so that seemed to help a lot. Neat. Did it affect your pagefaults? If so, how? -- Andreas Ericsson andreas.erics...@op5.se OP5 AB www.op5.se Tel: +46 8-230225 Fax: +46 8-230231 Considering the successes of the wars on alcohol, poverty, drugs and terror, I think we should give some serious thought to declaring war on peace. -- Increase Visibility of Your 3D Game App Earn a Chance To Win $500! Tap into the largest installed PC base get more eyes on your game by optimizing for Intel(R) Graphics Technology. Get started today with the Intel(R) Software Partner Program. Five $500 cash prizes are up for grabs. http://p.sf.net/sfu/intelisp-dev2dev ___ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null
Re: [Nagios-users] high latency
I find it interesting that a number of users get performance improvements with embedded perl off - we lose 20-40% polling capacity perl poller with it off. - Max -- Increase Visibility of Your 3D Game App Earn a Chance To Win $500! Tap into the largest installed PC base get more eyes on your game by optimizing for Intel(R) Graphics Technology. Get started today with the Intel(R) Software Partner Program. Five $500 cash prizes are up for grabs. http://p.sf.net/sfu/intelisp-dev2dev ___ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null
Re: [Nagios-users] high latency
On 12/03/2010 12:46 PM, Max Schubert wrote: I find it interesting that a number of users get performance improvements with embedded perl off - we lose 20-40% polling capacity perl poller with it off. How do you mean that you're losing capacity? Does latency start to creep upwards or is load increasing? Out of interest; How much memory does epn leak nowadays, and which perl version is it compiled against? -- Andreas Ericsson andreas.erics...@op5.se OP5 AB www.op5.se Tel: +46 8-230225 Fax: +46 8-230231 Considering the successes of the wars on alcohol, poverty, drugs and terror, I think we should give some serious thought to declaring war on peace. -- Increase Visibility of Your 3D Game App Earn a Chance To Win $500! Tap into the largest installed PC base get more eyes on your game by optimizing for Intel(R) Graphics Technology. Get started today with the Intel(R) Software Partner Program. Five $500 cash prizes are up for grabs. http://p.sf.net/sfu/intelisp-dev2dev ___ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null
Re: [Nagios-users] high latency
Latency increases much more quickly for us without epn as execution times are noticably longer per check. We use rhel 5.x, so the perl is 5.8.8. We have semi dailoy updates to our pollers and with epn that means cold restarts - memory leaks have not been noticable given that scenrio, but on test hosts or hosts where we are doing burn ins it is negligable enough that we can go for 2-3 days with no memory issues - we always hit service latency thresholds first. 7 seconds is in general where we have to force a restart of our pollers to prevent metric collection and snmp delta calculation issues. Max On 12/3/10, Andreas Ericsson a...@op5.se wrote: On 12/03/2010 12:46 PM, Max Schubert wrote: I find it interesting that a number of users get performance improvements with embedded perl off - we lose 20-40% polling capacity perl poller with it off. How do you mean that you're losing capacity? Does latency start to creep upwards or is load increasing? Out of interest; How much memory does epn leak nowadays, and which perl version is it compiled against? -- Andreas Ericsson andreas.erics...@op5.se OP5 AB www.op5.se Tel: +46 8-230225 Fax: +46 8-230231 Considering the successes of the wars on alcohol, poverty, drugs and terror, I think we should give some serious thought to declaring war on peace. -- Increase Visibility of Your 3D Game App Earn a Chance To Win $500! Tap into the largest installed PC base get more eyes on your game by optimizing for Intel(R) Graphics Technology. Get started today with the Intel(R) Software Partner Program. Five $500 cash prizes are up for grabs. http://p.sf.net/sfu/intelisp-dev2dev ___ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null
Re: [Nagios-users] high latency
Pagefaults - 20-30k. This seems to be the source of most of the cpu system time (understandably), which sits about 40-50%. So if I could reduce the pagefaults I think we could gain quite a bit of performance back. I found one other huge issue...somehow in the generic service check, the check_inteval was set to 5 minutes...however, normal_check_interval wasn't set at all and appeared to be checking every minute. I deleted check_interval and added normal_check_interval and that helped a ton, latency went down to 0.5-1.5 seconds. That was only running 2 active checks and about a dozen passive on 700 hosts. I then added back in the other 9 active checks and latency once again shot back up to about 2000 *sigh*. I grabbed another vm and made it a dnx client and that seemed to help, but wish I could get the main server to handle more. Right now it has about 700 hosts and 12,100 service checks, of which about 7000 are active and rest are passive. Oh, and we do have obsessive turned off. I've even gone through as many configs as I could and removed the macros too until I can write a caching mech for the macro statements. Any more ideas? -Original Message- From: Andreas Ericsson [mailto:a...@op5.se] Sent: Friday, December 03, 2010 5:39 AM To: Nagios Users List Cc: Daniel Wittenberg Subject: Re: [Nagios-users] high latency On 12/02/2010 08:38 PM, Daniel Wittenberg wrote: Someone else noticed that nagios is generating a ton of minor page faults, and curious if that's normal and if that could be causing some of the latency in the checks? define a ton $ /usr/bin/time php -r 'echo marsipulami\n;' marsipulami 0.01user 0.01system 0:00.09elapsed 34%CPU (0avgtext+0avgdata 29104maxresident)k 10208inputs+0outputs (70major+1962minor)pagefaults 0swaps That's with a reasonably simple program, and it generates 70 major and 1962 minor pagefaults. I've also got a tmpfs setup for the status.dat and the checkresults directory to ease some of the disk i/o since we're on a san-backed vm host. That's good, although if you're using a virtual system you'll never know for sure if you're really using a ramdisk or not, since the host system might well use swap to store the ramdisk anyway. I turned off embedded perl this morning and our latency has been holding at 10 seconds so far, so that seemed to help a lot. Neat. Did it affect your pagefaults? If so, how? -- Andreas Ericsson andreas.erics...@op5.se OP5 AB www.op5.se Tel: +46 8-230225 Fax: +46 8-230231 Considering the successes of the wars on alcohol, poverty, drugs and terror, I think we should give some serious thought to declaring war on peace. -- Increase Visibility of Your 3D Game App Earn a Chance To Win $500! Tap into the largest installed PC base get more eyes on your game by optimizing for Intel(R) Graphics Technology. Get started today with the Intel(R) Software Partner Program. Five $500 cash prizes are up for grabs. http://p.sf.net/sfu/intelisp-dev2dev ___ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null
Re: [Nagios-users] high latency
On 12/03/2010 04:31 PM, Daniel Wittenberg wrote: Pagefaults - 20-30k. This seems to be the source of most of the cpu system time (understandably), which sits about 40-50%. So if I could reduce the pagefaults I think we could gain quite a bit of performance back. Over what period of time? Here's from a program running a mere 1.22s, showing 13k pagefaults. The majority of that time is *not* spent trying to load the swapped out mmap regions, but in delta chain lookups inside the program logic. And so the output: $ time git repack Counting objects: 397, done. Delta compression using up to 4 threads. Compressing objects: 100% (397/397), done. Writing objects: 100% (397/397), done. Total 397 (delta 238), reused 0 (delta 0) 0.28user 0.09system 0:01.22elapsed 30%CPU (0avgtext+0avgdata 20544maxresident)k 6368inputs+464outputs (297major+12959minor)pagefaults 0swaps I really think you're misunderstanding what pagefaults are and how they work. Starting an X-server or openoffice.org is likely to generate somewhere around a million pagefaults each, simply because they use a lot of libraries, read a lot of config files, invoke a lot of helper programs and in attempt to access various devices. 20-30k pagefaults is *nothing* for a cpu capable of executing a couple of billion instructions per second. I found one other huge issue...somehow in the generic service check, the check_inteval was set to 5 minutes...however, normal_check_interval wasn't set at all and appeared to be checking every minute. I deleted check_interval and added normal_check_interval and that helped a ton, latency went down to 0.5-1.5 seconds. That was only running 2 active checks and about a dozen passive on 700 hosts. I then added back in the other 9 active checks and latency once again shot back up to about 2000 *sigh*. You're doing something weird. I'm 100% certain that this isn't Nagios' fault. Any chance you could share your config off-list? Remove passwords and addresses first if you like. I grabbed another vm and made it a dnx client and that seemed to help, but wish I could get the main server to handle more. Right now it has about 700 hosts and 12,100 service checks, of which about 7000 are active and rest are passive. Umm... First you said you added 9 checks and that made the entire thing just blow up, and now you're running 7000 active checks. What checks are you running? If you sort by cpu usage in top, is there anyone that's really prominent? -- Andreas Ericsson andreas.erics...@op5.se OP5 AB www.op5.se Tel: +46 8-230225 Fax: +46 8-230231 Considering the successes of the wars on alcohol, poverty, drugs and terror, I think we should give some serious thought to declaring war on peace. -- Increase Visibility of Your 3D Game App Earn a Chance To Win $500! Tap into the largest installed PC base get more eyes on your game by optimizing for Intel(R) Graphics Technology. Get started today with the Intel(R) Software Partner Program. Five $500 cash prizes are up for grabs. http://p.sf.net/sfu/intelisp-dev2dev ___ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null
Re: [Nagios-users] high latency
Sorry for confusion on that..I added 9 checks to *each* host, and there's about 700 hosts. No, it's all the nagios daemon itself (nagios -uxd). It feels like if I add that many more checks that it has a hard time doing the checks and processing the results since if I either move the active checking to dnx or drop them completely the load and latency times drop. Dan I grabbed another vm and made it a dnx client and that seemed to help, but wish I could get the main server to handle more. Right now it has about 700 hosts and 12,100 service checks, of which about 7000 are active and rest are passive. Umm... First you said you added 9 checks and that made the entire thing just blow up, and now you're running 7000 active checks. What checks are you running? If you sort by cpu usage in top, is there anyone that's really prominent? -- Andreas Ericsson andreas.erics...@op5.se OP5 AB www.op5.se Tel: +46 8-230225 Fax: +46 8-230231 Considering the successes of the wars on alcohol, poverty, drugs and terror, I think we should give some serious thought to declaring war on peace. -- Increase Visibility of Your 3D Game App Earn a Chance To Win $500! Tap into the largest installed PC base get more eyes on your game by optimizing for Intel(R) Graphics Technology. Get started today with the Intel(R) Software Partner Program. Five $500 cash prizes are up for grabs. http://p.sf.net/sfu/intelisp-dev2dev ___ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null
Re: [Nagios-users] high latency
It appears that nagios spawns lots and lots of new procs for all the various tasks it does, check results and such. I was curious, wouldn't a model more like Apache work better? Something like, a queue for work, and have worker processes grab off that queue, run a bunch of different jobs, then die, rather than just performing one task? That seems like it would still maintain stability and offer higher performance gains ? Dan -Original Message- From: Andreas Ericsson [mailto:a...@op5.se] Sent: Friday, December 03, 2010 5:22 AM To: Daniel Wittenberg Cc: Nagios Users List Subject: Re: [Nagios-users] high latency On 12/02/2010 06:42 PM, Daniel Wittenberg wrote: Embeded perl is interesting though, I hadn't tried that, thought it was supposed to help with performance. In theory, it does. It probably does in practice too, but the problems associated with it makes it not worth it. I don't think we have any obsessive stuff running right now. Check if you're not sure. Right now hardware is 4 proc vmware esx, 4GB RAM. For production there will be 12 of those boxes with the number of hosts being about 1200-1500 per nagios server. Virtual systems. Bleh. Anyways, if you're going to use a loadbalanced setup you should look into using Merlin. That way you get complete failover for free. -- Andreas Ericsson andreas.erics...@op5.se OP5 AB www.op5.se Tel: +46 8-230225 Fax: +46 8-230231 Considering the successes of the wars on alcohol, poverty, drugs and terror, I think we should give some serious thought to declaring war on peace. -- Increase Visibility of Your 3D Game App Earn a Chance To Win $500! Tap into the largest installed PC base get more eyes on your game by optimizing for Intel(R) Graphics Technology. Get started today with the Intel(R) Software Partner Program. Five $500 cash prizes are up for grabs. http://p.sf.net/sfu/intelisp-dev2dev ___ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null
Re: [Nagios-users] high latency
Can the use of dependencies also be the cause of increased latencies? I too struggle with them and I'm running on lightly-loaded physical hardware. We have 2 servers doing the checks sending back to a central server. Both distributed nodes use ocsp/ochp, but they do nothing more than append results to a file (i.e. it exits quickly). Results are handled outside of Nagios. What's odd is that distserver 1 and distserver 2 are configured the same distserver1: Hosts Checked 675 Services Checked: 4179 Active Service Latency: 0.000 / 3.155 / 0.382 sec Active Service Execution Time: 0.000 / 60.038 / 0.145 sec distserver2: Hosts Checked: 261 Services Checked: 4289 Active Service Latency: 0.000 / 169.977 / 81.300 sec Active Service Execution Time: 0.000 / 15.270 / 0.211 sec yet as you can see, distserver2's latency is much higher and always has been. I tried turning off EPN yesterday on distserver2 and it had no discernable effect. We added 400 new service checks yesterday on distserver2 (just more of the same checks we already do but on 26 new hosts) and the latency went from 35 to over 80. The checks we do are very different (Windows, Linux, Unix, many are app-centric) so it's difficult to compare exactly what runs on distserver1 and distserver2, but given the jump that was taken yesterday, I'm wondering if the fact that the type of checks on these new hosts are all built on dependencies make me wonder if that doesn't have something to do with it. These hosts (Windows) have a basic check for NRPE and all other checks on the host are dependent on the NRPE check succeeding. I have to move to all new Nagios servers very soon. I'm interested in Merlin, but given its non-production nature just yet, I'm hesitant to commit and I'm not sure if it will help me here. Thanks Mark -- Increase Visibility of Your 3D Game App Earn a Chance To Win $500! Tap into the largest installed PC base get more eyes on your game by optimizing for Intel(R) Graphics Technology. Get started today with the Intel(R) Software Partner Program. Five $500 cash prizes are up for grabs. http://p.sf.net/sfu/intelisp-dev2dev ___ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null
Re: [Nagios-users] high latency
Yeah, for giggles I went back further through the archives last night and found stuff back to 2.x series, and not much has seemed to help. I killed some of my mis-behaving active checks, and that dropped to about 20 seconds, then went up to about 35-50. So while that's better, I have A LOT more hosts and service checks to add, and am afraid it'll go nuts when I dump more on. I think I've tried about all the config options I could find and some helped, some didn't seem to, but there should be plenty of horsepower on the machine to run this much faster so not sure why it's not. Dan From: Assaf Flatto [mailto:nag...@flatto.net] Sent: Wednesday, December 01, 2010 11:26 AM To: Nagios Users List Cc: Daniel Wittenberg Subject: Re: [Nagios-users] high latency dan there were a couple of discussions on the list that dealt with latency issues . Have you tried looking at the list archives about the topic ? Assaf On 01/12/10 16:00, Daniel Wittenberg wrote: I've been watching my latency graphs, and showing 2000 seconds for some service and host checks. What I don't understand is I still have idle time on the CPU, (quad processor) so I'm curious if the server isn't in trouble, why am I seeing such high latency? Or maybe I misunderstand how latency is calculated? I do have 9 service checks that are failing on about 700 hosts if that matters at all. Trying to tweak the performance to the max on this so any insight welcome. Thanks, Dan -- Never,Ever Cut A Deal With a Dragon Next year I will be doing the London to Paris bike ride to raise money for the DogTrust (www.dogstrust.co.uk) . Please Sponsor me at http://www.justgiving.com/Assaf-Flatto -- Increase Visibility of Your 3D Game App Earn a Chance To Win $500! Tap into the largest installed PC base get more eyes on your game by optimizing for Intel(R) Graphics Technology. Get started today with the Intel(R) Software Partner Program. Five $500 cash prizes are up for grabs. http://p.sf.net/sfu/intelisp-dev2dev___ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null
Re: [Nagios-users] high latency
Yeah, for giggles I went back further through the archives last night and found stuff back to 2.x series, and not much has seemed to help. I killed some of my mis-behaving active checks, and that dropped to about 20 seconds, then went up to about 35-50. So while that's better, I have A LOT more hosts and service checks to add, and am afraid it'll go nuts when I dump more on. I think I've tried about all the config options I could find and some helped, some didn't seem to, but there should be plenty of horsepower on the machine to run this much faster so not sure why it's not. Hey Dan, I too have been wrestling alligators with service and host check latencies averaging around 60s, and increasing to 100+ (sometimes to 300) after a few reloads during the day. This morning, I enabled the use_large_installation_tweaks option. As of a minute ago, my host check latency is now averaging 2.116s, and service check latency is averaging 0.748s. I didn't see if you had tried this yet, it might be something to consider. Benny -- No matter how many shorts we have in the system, my guards will be instructed to treat every surveillance camera malfunction as a full-scale emergency. -- Peter Anspach's Evil Overlord List, #67 -- Increase Visibility of Your 3D Game App Earn a Chance To Win $500! Tap into the largest installed PC base get more eyes on your game by optimizing for Intel(R) Graphics Technology. Get started today with the Intel(R) Software Partner Program. Five $500 cash prizes are up for grabs. http://p.sf.net/sfu/intelisp-dev2dev ___ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null
Re: [Nagios-users] high latency
Yeah, been running that since day one, since when rollout is done we'll probably have about 18k servers and around 3 million service checks... I can probably post my relevant config options if someone wants to peak. Dan -Original Message- From: C. Bensend [mailto:be...@bennyvision.com] Sent: Thursday, December 02, 2010 10:46 AM To: nagios-users@lists.sourceforge.net Subject: Re: [Nagios-users] high latency Yeah, for giggles I went back further through the archives last night and found stuff back to 2.x series, and not much has seemed to help. I killed some of my mis-behaving active checks, and that dropped to about 20 seconds, then went up to about 35-50. So while that's better, I have A LOT more hosts and service checks to add, and am afraid it'll go nuts when I dump more on. I think I've tried about all the config options I could find and some helped, some didn't seem to, but there should be plenty of horsepower on the machine to run this much faster so not sure why it's not. Hey Dan, I too have been wrestling alligators with service and host check latencies averaging around 60s, and increasing to 100+ (sometimes to 300) after a few reloads during the day. This morning, I enabled the use_large_installation_tweaks option. As of a minute ago, my host check latency is now averaging 2.116s, and service check latency is averaging 0.748s. I didn't see if you had tried this yet, it might be something to consider. Benny -- No matter how many shorts we have in the system, my guards will be instructed to treat every surveillance camera malfunction as a full-scale emergency. -- Peter Anspach's Evil Overlord List, #67 -- Increase Visibility of Your 3D Game App Earn a Chance To Win $500! Tap into the largest installed PC base get more eyes on your game by optimizing for Intel(R) Graphics Technology. Get started today with the Intel(R) Software Partner Program. Five $500 cash prizes are up for grabs. http://p.sf.net/sfu/intelisp-dev2dev ___ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null -- Increase Visibility of Your 3D Game App Earn a Chance To Win $500! Tap into the largest installed PC base get more eyes on your game by optimizing for Intel(R) Graphics Technology. Get started today with the Intel(R) Software Partner Program. Five $500 cash prizes are up for grabs. http://p.sf.net/sfu/intelisp-dev2dev ___ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null
Re: [Nagios-users] high latency
On 12/02/2010 04:59 PM, Daniel Wittenberg wrote: Yeah, for giggles I went back further through the archives last night and found stuff back to 2.x series, and not much has seemed to help. I killed some of my mis-behaving active checks, and that dropped to about 20 seconds, then went up to about 35-50. So while that's better, I have A LOT more hosts and service checks to add, and am afraid it'll go nuts when I dump more on. I think I've tried about all the config options I could find and some helped, some didn't seem to, but there should be plenty of horsepower on the machine to run this much faster so not sure why it's not. Dan From: Assaf Flatto [mailto:nag...@flatto.net] Sent: Wednesday, December 01, 2010 11:26 AM To: Nagios Users List Cc: Daniel Wittenberg Subject: Re: [Nagios-users] high latency dan there were a couple of discussions on the list that dealt with latency issues . Have you tried looking at the list archives about the topic ? Assaf On 01/12/10 16:00, Daniel Wittenberg wrote: I've been watching my latency graphs, and showing 2000 seconds for some service and host checks. What I don't understand is I still have idle time on the CPU, (quad processor) so I'm curious if the server isn't in trouble, why am I seeing such high latency? Or maybe I misunderstand how latency is calculated? I do have 9 service checks that are failing on about 700 hosts if that matters at all. Trying to tweak the performance to the max on this so any insight welcome. Ditch your performance-data processing and see if that helps. You might also want to get rid of embedded perl. It's been known to cause really weird errors (although primarily memory leaks). You'll also want to get rid of obsessive host and service commands. How large is your installation and what hardware and system are you running it on? -- Andreas Ericsson andreas.erics...@op5.se OP5 AB www.op5.se Tel: +46 8-230225 Fax: +46 8-230231 Considering the successes of the wars on alcohol, poverty, drugs and terror, I think we should give some serious thought to declaring war on peace. -- Increase Visibility of Your 3D Game App Earn a Chance To Win $500! Tap into the largest installed PC base get more eyes on your game by optimizing for Intel(R) Graphics Technology. Get started today with the Intel(R) Software Partner Program. Five $500 cash prizes are up for grabs. http://p.sf.net/sfu/intelisp-dev2dev ___ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null
Re: [Nagios-users] high latency
On 12/02/2010 06:05 PM, Daniel Wittenberg wrote: Yeah, been running that since day one, since when rollout is done we'll probably have about 18k servers and around 3 million service checks... 170 services per host? Sounds like an awful lot of switches. I'd use some cleverness to grab snmp-info once and parse the data afterwards if I were you. For that kind of installation, you'll need to use a distributed setup of some sort. merlin, dnx and apparently mod-gearman should get you going in the right direction. -- Andreas Ericsson andreas.erics...@op5.se OP5 AB www.op5.se Tel: +46 8-230225 Fax: +46 8-230231 Considering the successes of the wars on alcohol, poverty, drugs and terror, I think we should give some serious thought to declaring war on peace. -- Increase Visibility of Your 3D Game App Earn a Chance To Win $500! Tap into the largest installed PC base get more eyes on your game by optimizing for Intel(R) Graphics Technology. Get started today with the Intel(R) Software Partner Program. Five $500 cash prizes are up for grabs. http://p.sf.net/sfu/intelisp-dev2dev ___ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null
Re: [Nagios-users] high latency
Not using SNMP for any of the checks, and most are passive checks. For the few active checks we are probably going to be using dnx. Embeded perl is interesting though, I hadn't tried that, thought it was supposed to help with performance. I don't think we have any obsessive stuff running right now. Right now hardware is 4 proc vmware esx, 4GB RAM. For production there will be 12 of those boxes with the number of hosts being about 1200-1500 per nagios server. Dan -Original Message- From: Andreas Ericsson [mailto:a...@op5.se] Sent: Thursday, December 02, 2010 11:19 AM To: Nagios Users List Cc: Daniel Wittenberg Subject: Re: [Nagios-users] high latency On 12/02/2010 06:05 PM, Daniel Wittenberg wrote: Yeah, been running that since day one, since when rollout is done we'll probably have about 18k servers and around 3 million service checks... 170 services per host? Sounds like an awful lot of switches. I'd use some cleverness to grab snmp-info once and parse the data afterwards if I were you. For that kind of installation, you'll need to use a distributed setup of some sort. merlin, dnx and apparently mod-gearman should get you going in the right direction. -- Andreas Ericsson andreas.erics...@op5.se OP5 AB www.op5.se Tel: +46 8-230225 Fax: +46 8-230231 Considering the successes of the wars on alcohol, poverty, drugs and terror, I think we should give some serious thought to declaring war on peace. -- Increase Visibility of Your 3D Game App Earn a Chance To Win $500! Tap into the largest installed PC base get more eyes on your game by optimizing for Intel(R) Graphics Technology. Get started today with the Intel(R) Software Partner Program. Five $500 cash prizes are up for grabs. http://p.sf.net/sfu/intelisp-dev2dev ___ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null
Re: [Nagios-users] high latency
Someone else noticed that nagios is generating a ton of minor page faults, and curious if that's normal and if that could be causing some of the latency in the checks? I've also got a tmpfs setup for the status.dat and the checkresults directory to ease some of the disk i/o since we're on a san-backed vm host. I turned off embedded perl this morning and our latency has been holding at 10 seconds so far, so that seemed to help a lot. Dan -Original Message- From: Daniel Wittenberg [mailto:daniel.wittenberg.r...@statefarm.com] Sent: Thursday, December 02, 2010 11:42 AM To: Andreas Ericsson; Nagios Users List Subject: Re: [Nagios-users] high latency Not using SNMP for any of the checks, and most are passive checks. For the few active checks we are probably going to be using dnx. Embeded perl is interesting though, I hadn't tried that, thought it was supposed to help with performance. I don't think we have any obsessive stuff running right now. Right now hardware is 4 proc vmware esx, 4GB RAM. For production there will be 12 of those boxes with the number of hosts being about 1200-1500 per nagios server. Dan -Original Message- From: Andreas Ericsson [mailto:a...@op5.se] Sent: Thursday, December 02, 2010 11:19 AM To: Nagios Users List Cc: Daniel Wittenberg Subject: Re: [Nagios-users] high latency On 12/02/2010 06:05 PM, Daniel Wittenberg wrote: Yeah, been running that since day one, since when rollout is done we'll probably have about 18k servers and around 3 million service checks... 170 services per host? Sounds like an awful lot of switches. I'd use some cleverness to grab snmp-info once and parse the data afterwards if I were you. For that kind of installation, you'll need to use a distributed setup of some sort. merlin, dnx and apparently mod-gearman should get you going in the right direction. -- Andreas Ericsson andreas.erics...@op5.se OP5 AB www.op5.se Tel: +46 8-230225 Fax: +46 8-230231 Considering the successes of the wars on alcohol, poverty, drugs and terror, I think we should give some serious thought to declaring war on peace. -- Increase Visibility of Your 3D Game App Earn a Chance To Win $500! Tap into the largest installed PC base get more eyes on your game by optimizing for Intel(R) Graphics Technology. Get started today with the Intel(R) Software Partner Program. Five $500 cash prizes are up for grabs. http://p.sf.net/sfu/intelisp-dev2dev ___ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null -- Increase Visibility of Your 3D Game App Earn a Chance To Win $500! Tap into the largest installed PC base get more eyes on your game by optimizing for Intel(R) Graphics Technology. Get started today with the Intel(R) Software Partner Program. Five $500 cash prizes are up for grabs. http://p.sf.net/sfu/intelisp-dev2dev ___ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null
Re: [Nagios-users] high latency
dan there were a couple of discussions on the list that dealt with latency issues . Have you tried looking at the list archives about the topic ? Assaf On 01/12/10 16:00, Daniel Wittenberg wrote: I've been watching my latency graphs, and showing 2000 seconds for some service and host checks. What I don't understand is I still have idle time on the CPU, (quad processor) so I'm curious if the server isn't in trouble, why am I seeing such high latency? Or maybe I misunderstand how latency is calculated? I do have 9 service checks that are failing on about 700 hosts if that matters at all. Trying to tweak the performance to the max on this so any insight welcome. Thanks, Dan -- Never,Ever Cut A Deal With a Dragon Next year I will be doing the London to Paris bike ride to raise money for the DogTrust (www.dogstrust.co.uk) . Please Sponsor me at http://www.justgiving.com/Assaf-Flatto -- Increase Visibility of Your 3D Game App Earn a Chance To Win $500! Tap into the largest installed PC base get more eyes on your game by optimizing for Intel(R) Graphics Technology. Get started today with the Intel(R) Software Partner Program. Five $500 cash prizes are up for grabs. http://p.sf.net/sfu/intelisp-dev2dev___ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null
Re: [Nagios-users] High latency on small installation
On Jul 20, 2010, at 11:36 AM, Assaf Flatto wrote: Hello All I am having a problem with very high latency on my main nagios server (3.2.0 from source on SLES 10.3 x64). I recompiled the core with the embedded perl and that helped for a while to lower the latency but it keeps growing to times that are not reasonable for this size of a nagios installation . event_broker_options=-1 broker_module=/usr/local/nagios/bin/ndomod-3x.o Is it better if you disable the event broker? If so, search the archives for information about it and database tuning. There has been somewhat recent discussion about higher latency as the database grows in size. process_performance_data=1 host_perfdata_command=process-host-perfdata service_perfdata_command=process-service-perfdata Is it better if you disable this? If so, see if there's any performance tuning information for the addon you are using. enable_environment_macros=1 Disable this if you are not explicitly using it. Chances are very high that you are not. -- Marc -- This SF.net email is sponsored by Sprint What will you do first with EVO, the first 4G phone? Visit sprint.com/first -- http://p.sf.net/sfu/sprint-com-first ___ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null
Re: [Nagios-users] High latency on small installation
event_broker_options=-1 broker_module=/usr/local/nagios/bin/ndomod-3x.o Is it better if you disable the event broker? If so, search the archives for information about it and database tuning. There has been somewhat recent discussion about higher latency as the database grows in size. I was part of that thread , and i can not remove it since we need it for the nagviz . process_performance_data=1 host_perfdata_command=process-host-perfdata service_perfdata_command=process-service-perfdata Is it better if you disable this? If so, see if there's any performance tuning information for the addon you are using. Again - needed for pnp4nagios we use for our graphs. enable_environment_macros=1 Disable this if you are not explicitly using it. Chances are very high that you are not. That hit the spot right one . Active Service Latency: 5.034 / 462.816 / 197.907 sec and dropping Thanks Marc. -- Never,Ever Cut A Deal With a Dragon -- This SF.net email is sponsored by Sprint What will you do first with EVO, the first 4G phone? Visit sprint.com/first -- http://p.sf.net/sfu/sprint-com-first ___ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null
Re: [Nagios-users] High latency on small installation
Assaf Flatto wrote: event_broker_options=-1 broker_module=/usr/local/nagios/bin/ndomod-3x.o Is it better if you disable the event broker? If so, search the archives for information about it and database tuning. There has been somewhat recent discussion about higher latency as the database grows in size. I was part of that thread , and i can not remove it since we need it for the nagviz . process_performance_data=1 host_perfdata_command=process-host-perfdata service_perfdata_command=process-service-perfdata Is it better if you disable this? If so, see if there's any performance tuning information for the addon you are using. Again - needed for pnp4nagios we use for our graphs. enable_environment_macros=1 Disable this if you are not explicitly using it. Chances are very high that you are not. That hit the spot right one . Active Service Latency: 5.034 / 462.816 / 197.907 sec and dropping Thanks Marc. Guess my joy was too pre mature , after the change it dropped all the way to 170 sec and then started climbing back up , not again it stands on Active Service Latency: 5.679 / 441.738 / 384.102 sec I have removed the ndo broker and that helped by lowering the latency to Active Service Latency:0.129 / 441.738 / 268.447 sec It does mean i will lose the nagviz and the nagiosBP plugin . but for now it will have to do -- Never,Ever Cut A Deal With a Dragon -- This SF.net email is sponsored by Sprint What will you do first with EVO, the first 4G phone? Visit sprint.com/first -- http://p.sf.net/sfu/sprint-com-first ___ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null
Re: [Nagios-users] High latency on small installation
Could it be that the more service check I move off the main Nagios server an on to the tested hosts to run via NRPE , that the Latency will increase ? Assaf Marc Powell wrote: On Jul 20, 2010, at 11:36 AM, Assaf Flatto wrote: Hello All I am having a problem with very high latency on my main nagios server (3.2.0 from source on SLES 10.3 x64). I recompiled the core with the embedded perl and that helped for a while to lower the latency but it keeps growing to times that are not reasonable for this size of a nagios installation . event_broker_options=-1 broker_module=/usr/local/nagios/bin/ndomod-3x.o Is it better if you disable the event broker? If so, search the archives for information about it and database tuning. There has been somewhat recent discussion about higher latency as the database grows in size. process_performance_data=1 host_perfdata_command=process-host-perfdata service_perfdata_command=process-service-perfdata Is it better if you disable this? If so, see if there's any performance tuning information for the addon you are using. enable_environment_macros=1 Disable this if you are not explicitly using it. Chances are very high that you are not. -- -- Never,Ever Cut A Deal With a Dragon -- This SF.net email is sponsored by Sprint What will you do first with EVO, the first 4G phone? Visit sprint.com/first -- http://p.sf.net/sfu/sprint-com-first ___ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null