Re: [Nagios-users] High Latency with service checks

2011-02-16 Thread Assaf Flatto
Yu Watanabe wrote:
 Hello All.

 I would like to get an advice with nagios latency issue.

 In nagios 3.0.6 with RHEL 4, would there be a possibility 
 that service check latency hikes even though the sar or iostat
 usage is not in relatively high load usage? (I am planning to upgrade to v 
 3.2.3 soon.)

 SAR average cpu usage were 40% and iowait was lying 0%. Swaping were not 
 occuring.

 I have more than 1000 ping checks and average latency are 20 min. Check 
 execution time for ping is
 all below 1 sec. Plugin I am using is check_icmp. 

 CPU I am using is ,
 Intel(R) Xeon(TM) CPU 2.80GHz
 Memory is 8GB
 OS is RHEL 4.4

 Are there any possiblity that Nagios gets locked up with the service check 
 scheduling?

 Thanks,
 Yu


   
have you tried throttling the amount of concurrent checks ?
could it be that the 1000 pings is flooding your network ?

I've encountered a similar issue with a  a setup I had ( granted the 
latency wasn't that extreme ) , and from what you describe the symptoms 
sound the same .

We tried several solutions (DNX , Mod_gearman) to reduce the latency , 
the solution that worked int he end was adding extra RAM to the machine 
and that solved it - I know that is not the best method , but non of the 
regular methods of tweaking nagios (large install , ramfs etc` ) 
worked , the boost in ram reduced the latency from 6+ minutes to 3 sec.

Assaf


--
The ultimate all-in-one performance toolkit: Intel(R) Parallel Studio XE:
Pinpoint memory and threading errors before they happen.
Find and fix more than 250 security defects in the development cycle.
Locate bottlenecks in serial and parallel code that limit performance.
http://p.sf.net/sfu/intel-dev2devfeb
___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null


Re: [Nagios-users] High Latency with service checks

2011-02-16 Thread Yueh-Hung Liu
another possibility, do you use addon to do task after every checking?
for example, update RRD files for performance graphing, sometimes that
should increase the checking latency.


2011/2/16 Yu Watanabe yu.watan...@jp.fujitsu.com:
 Thank you very much for the reply.


 Assaf Flatto さんは書きました:
Yu Watanabe wrote:
 Hello All.

 I would like to get an advice with nagios latency issue.

 In nagios 3.0.6 with RHEL 4, would there be a possibility
 that service check latency hikes even though the sar or iostat
 usage is not in relatively high load usage? (I am planning to upgrade to v 
 3.2.3 soon.)

 SAR average cpu usage were 40% and iowait was lying 0%. Swaping were not 
 occuring.

 I have more than 1000 ping checks and average latency are 20 min. Check 
 execution time for ping is
 all below 1 sec. Plugin I am using is check_icmp.

 CPU I am using is ,
 Intel(R) Xeon(TM) CPU 2.80GHz
 Memory is 8GB
 OS is RHEL 4.4

 Are there any possiblity that Nagios gets locked up with the service check 
 scheduling?

 Thanks,
 Yu



have you tried throttling the amount of concurrent checks ?
could it be that the 1000 pings is flooding your network ?

  I am not sure that I understand the meaning by throttling...
  Would there be a parameter in Nagios to control this?

  For flooding I will check the netstat -s periodically and see if there are 
 too much.


I've encountered a similar issue with a  a setup I had ( granted the
latency wasn't that extreme ) , and from what you describe the symptoms
sound the same .

We tried several solutions (DNX , Mod_gearman) to reduce the latency ,
the solution that worked int he end was adding extra RAM to the machine
and that solved it - I know that is not the best method , but non of the
regular methods of tweaking nagios (large install , ramfs etc` )
worked , the boost in ram reduced the latency from 6+ minutes to 3 sec.

  As long as I see the vmstat , there seems to be enough memories left
  for buffer and cache since there aren't any swaping.

 Thanks ,
 Yu


Assaf


--
The ultimate all-in-one performance toolkit: Intel(R) Parallel Studio XE:
Pinpoint memory and threading errors before they happen.
Find and fix more than 250 security defects in the development cycle.
Locate bottlenecks in serial and parallel code that limit performance.
http://p.sf.net/sfu/intel-dev2devfeb
___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue.
::: Messages without supporting info will risk being sent to /dev/null



 --
 The ultimate all-in-one performance toolkit: Intel(R) Parallel Studio XE:
 Pinpoint memory and threading errors before they happen.
 Find and fix more than 250 security defects in the development cycle.
 Locate bottlenecks in serial and parallel code that limit performance.
 http://p.sf.net/sfu/intel-dev2devfeb
 ___
 Nagios-users mailing list
 Nagios-users@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/nagios-users
 ::: Please include Nagios version, plugin version (-v) and OS when reporting 
 any issue.
 ::: Messages without supporting info will risk being sent to /dev/null


--
The ultimate all-in-one performance toolkit: Intel(R) Parallel Studio XE:
Pinpoint memory and threading errors before they happen.
Find and fix more than 250 security defects in the development cycle.
Locate bottlenecks in serial and parallel code that limit performance.
http://p.sf.net/sfu/intel-dev2devfeb
___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null

Re: [Nagios-users] High Latency with service checks

2011-02-16 Thread Yu Watanabe
Thank you for the reply Yueh.

We graph the results but not after every check. I use the performance output 
to pool it to a file first . 

Thanks ,
Yu

Yueh-Hung Liu さんは書きました:
another possibility, do you use addon to do task after every checking?
for example, update RRD files for performance graphing, sometimes that
should increase the checking latency.


2011/2/16 Yu Watanabe yu.watan...@jp.fujitsu.com:
 Thank you very much for the reply.


 Assaf Flatto さんは書きました:
Yu Watanabe wrote:
 Hello All.

 I would like to get an advice with nagios latency issue.

 In nagios 3.0.6 with RHEL 4, would there be a possibility
 that service check latency hikes even though the sar or iostat
 usage is not in relatively high load usage? (I am planning to upgrade to v 
 3.2.3 soon.)

 SAR average cpu usage were 40% and iowait was lying 0%. Swaping were not 
 occuring.

 I have more than 1000 ping checks and average latency are 20 min. Check 
 execution time for ping is
 all below 1 sec. Plugin I am using is check_icmp.

 CPU I am using is ,
 Intel(R) Xeon(TM) CPU 2.80GHz
 Memory is 8GB
 OS is RHEL 4.4

 Are there any possiblity that Nagios gets locked up with the service check 
 scheduling?

 Thanks,
 Yu



have you tried throttling the amount of concurrent checks ?
could it be that the 1000 pings is flooding your network ?

 ?I am not sure that I understand the meaning by throttling...
 ?Would there be a parameter in Nagios to control this?

 ?For flooding I will check the netstat -s periodically and see if there are 
 too much.


I've encountered a similar issue with a ?a setup I had ( granted the
latency wasn't that extreme ) , and from what you describe the symptoms
sound the same .

We tried several solutions (DNX , Mod_gearman) to reduce the latency ,
the solution that worked int he end was adding extra RAM to the machine
and that solved it - I know that is not the best method , but non of the
regular methods of tweaking nagios (large install , ramfs etc` )
worked , the boost in ram reduced the latency from 6+ minutes to 3 sec.

 ?As long as I see the vmstat , there seems to be enough memories left
 ?for buffer and cache since there aren't any swaping.

 Thanks ,
 Yu


Assaf


--
The ultimate all-in-one performance toolkit: Intel(R) Parallel Studio XE:
Pinpoint memory and threading errors before they happen.
Find and fix more than 250 security defects in the development cycle.
Locate bottlenecks in serial and parallel code that limit performance.
http://p.sf.net/sfu/intel-dev2devfeb
___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue.
::: Messages without supporting info will risk being sent to /dev/null



 --
 The ultimate all-in-one performance toolkit: Intel(R) Parallel Studio XE:
 Pinpoint memory and threading errors before they happen.
 Find and fix more than 250 security defects in the development cycle.
 Locate bottlenecks in serial and parallel code that limit performance.
 http://p.sf.net/sfu/intel-dev2devfeb
 ___
 Nagios-users mailing list
 Nagios-users@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/nagios-users
 ::: Please include Nagios version, plugin version (-v) and OS when reporting 
 any issue.
 ::: Messages without supporting info will risk being sent to /dev/null


--
The ultimate all-in-one performance toolkit: Intel(R) Parallel Studio XE:
Pinpoint memory and threading errors before they happen.
Find and fix more than 250 security defects in the development cycle.
Locate bottlenecks in serial and parallel code that limit performance.
http://p.sf.net/sfu/intel-dev2devfeb
___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null


--
The ultimate all-in-one performance toolkit: Intel(R) Parallel Studio XE:
Pinpoint memory and threading errors before they happen.
Find and fix more than 250 security defects in the development cycle.
Locate bottlenecks in serial and parallel code that limit performance.
http://p.sf.net/sfu/intel-dev2devfeb
___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when 

Re: [Nagios-users] high latency

2010-12-12 Thread Andreas Ericsson
On 12/11/2010 07:14 PM, Frost, Mark {PBC} wrote:
 -Original Message-
 From: Andreas Ericsson [mailto:a...@op5.se]
 Sent: Tuesday, December 07, 2010 5:57 PM
 To: Frost, Mark {PBC}
 Cc: Nagios Users List
 Subject: Re: [Nagios-users] high latency


 Any chance that the OP5 site will eventually be
 configured to allow git through a proxy?  It's of course less convenient to
 use snapshot tarballs, but still workable, of course.


 You mean through http? Doesn't it already? I think it's supposed to. I can 
 check
 up on that later. The gitweb page has links for grabbing latest master as a
 tarball though. That might work as an interim solution.

 -- 
 Andreas Ericsson   andreas.erics...@op5.se
 OP5 AB www.op5.se
 Tel: +46 8-230225  Fax: +46 8-230231
 
 Andreas,
 
 It's just never worked for me and I thought you'd mentioned some time ago that
 OP5's git site just didn't support it.
 
 I've validated that my version of git (1.7.1) will grab code from a public 
 site
 via our corporate proxy using other public code (the proxy is setup via the 
 $http_proxy environment variable):
 
   $ git clone http://github.com/schacon/grack.git
   Initialized empty Git repository in /home/mfrost0/src/grack/.git/
   remote: Counting objects: 85, done.
   remote: Compressing objects: 100% (45/45), done.
   remote: Total 85 (delta 32), reused 80 (delta 31)
   Unpacking objects: 100% (85/85), done.
 
 but...
 
   $ git clone http://git.op5.org/nagios/merlin.git merlin-src
   Initialized empty Git repository in /home/mfrost0/src/merlin-src/.git/
   fatal: http://git.op5.org/nagios/merlin.git/info/refs not found: did 
 you run git update-server-info on the server?
   $ git clone http://git.op5.org/nagios.git nagios-src
   Initialized empty Git repository in /home/mfrost0/src/nagios-src/.git/
   fatal: http://git.op5.org/nagios.git/info/refs not found: did you run 
 git update-server-info on the server?
 
 so, you know :-(
 

Aight. I'll look into it tomorrow when I get to work. It's supposed
to work anyways.

-- 
Andreas Ericsson   andreas.erics...@op5.se
OP5 AB www.op5.se
Tel: +46 8-230225  Fax: +46 8-230231

Considering the successes of the wars on alcohol, poverty, drugs and
terror, I think we should give some serious thought to declaring war
on peace.

--
Oracle to DB2 Conversion Guide: Learn learn about native support for PL/SQL,
new data types, scalar functions, improved concurrency, built-in packages, 
OCI, SQL*Plus, data movement tools, best practices and more.
http://p.sf.net/sfu/oracle-sfdev2dev 
___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null


Re: [Nagios-users] high latency

2010-12-11 Thread Frost, Mark {PBC}
 -Original Message-
 From: Andreas Ericsson [mailto:a...@op5.se] 
 Sent: Tuesday, December 07, 2010 5:57 PM
 To: Frost, Mark {PBC}
 Cc: Nagios Users List
 Subject: Re: [Nagios-users] high latency
 
  
  Any chance that the OP5 site will eventually be
  configured to allow git through a proxy?  It's of course less convenient to
  use snapshot tarballs, but still workable, of course.
  
 
 You mean through http? Doesn't it already? I think it's supposed to. I can 
 check
 up on that later. The gitweb page has links for grabbing latest master as a
 tarball though. That might work as an interim solution.

 -- 
 Andreas Ericsson   andreas.erics...@op5.se
 OP5 AB www.op5.se
 Tel: +46 8-230225  Fax: +46 8-230231

Andreas,

It's just never worked for me and I thought you'd mentioned some time ago that
OP5's git site just didn't support it.

I've validated that my version of git (1.7.1) will grab code from a public site
via our corporate proxy using other public code (the proxy is setup via the 
$http_proxy environment variable):

$ git clone http://github.com/schacon/grack.git
Initialized empty Git repository in /home/mfrost0/src/grack/.git/
remote: Counting objects: 85, done.
remote: Compressing objects: 100% (45/45), done.
remote: Total 85 (delta 32), reused 80 (delta 31)
Unpacking objects: 100% (85/85), done.

but...

$ git clone http://git.op5.org/nagios/merlin.git merlin-src
Initialized empty Git repository in /home/mfrost0/src/merlin-src/.git/
fatal: http://git.op5.org/nagios/merlin.git/info/refs not found: did 
you run git update-server-info on the server?
$ git clone http://git.op5.org/nagios.git nagios-src
Initialized empty Git repository in /home/mfrost0/src/nagios-src/.git/
fatal: http://git.op5.org/nagios.git/info/refs not found: did you run 
git update-server-info on the server?

so, you know :-(

Thanks

Mark

--
Oracle to DB2 Conversion Guide: Learn learn about native support for PL/SQL,
new data types, scalar functions, improved concurrency, built-in packages, 
OCI, SQL*Plus, data movement tools, best practices and more.
http://p.sf.net/sfu/oracle-sfdev2dev 
___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null


Re: [Nagios-users] high latency

2010-12-07 Thread Frost, Mark {PBC}

 -Original Message-
 From: Andreas Ericsson [mailto:a...@op5.se] 
 Sent: Tuesday, December 07, 2010 9:44 AM
 
  Hmm.  So then I'd be so curious why the 2 distservers which are both using
  oc[sh]p commands the same way have such radically different latencies.
  

 Agreed. There must be other differences too. Perhaps there's trouble resolving
 from one of the nodes? That usually makes checks run a helluva lot longer than
 they normally have to.

I had another look.  While I found a test host that I'd made that was
deliberately unreachable, I found that when I removed it it made no
difference.  Execution times are significantly lower (min/max/avg) on
the host with the high latencies than for the one with low latencies.
I don't see any unresolvable hosts or now, any unreachable hosts.
Puzzling.

I've always wished there was an easy way to see which processes had
high latencies from the web interface without having to view the status.dat
file...

  Either way, you're suggesting that having a NEB module handle the
  post-check work will eliminate the serialization.

 Yes. Sneaking a peak at what's needed in order for an event to get sent to
 master via an eventbroker compared to running an oc[sh]p command renders
 this, more or less:

 [ good stuff snipped...]

Wow.

 In terms of effort, the difference is sort of like either hopping on one
 leg along the entire great wall of china or walking to the kitchen and grab
 a beer.

  
  parallelize_check is set to 1 everywhere.
 
 Does one server have a lot of random service failures? On-demand hostchecks 
 are
 still run in parallel.

I don't think so.  Intermittent you mean?  Not as far as I know or can see.

   What version of Nagios are you running?
  
  3.2.1
 
 I take it upgrading makes no difference?

To 3.2.3?   I'll probably try that on the new servers, but if things work out I 
may
just move to Merlin + 3.2.4.  I wasn't sure I saw anything in the 3.2.3 release 
that
I found compelling for us at the time.  As I say, this system now has fairly 
high
visibility so just trying something like that would involve a rather painful
internal change process.  It's like piloting the QE2 -- I can't change
course very quickly :-)

  Thanks, Andreas.  I'm hoping to allocate sufficient resources on the new 
  servers
  to be able to play with Merlin more there.
 
 It's quite resource-friendly actually. Well, compared to what you're running 
 now
 it's positively feather-light.

I meant more like installing MySQL everywhere, building filesystems to hold the
MySQL data, etc.  Not so much like I need more memory or more CPUs.  I don't
remember seeing anything in the Merlin docs (maybe I missed it), but how
large would the MySQL database need to be?  Pretty small on each box, right?
Like 500MB or less?

   Will I be able to have the performance
  data from a poller be sent up to a NOC for digestion by pnp4nagios?

 Yes, but you'll need the threadsafe version of Nagios you can obtain from 
 either
 CVS or git://git.op5.org/nagios.git for performance-data to work. Actually, 
 you
 need that for Merlin to work.

That's part of the plan.  Any chance that the OP5 site will eventually be
configured to allow git through a proxy?  It's of course less convenient to
use snapshot tarballs, but still workable, of course.

   It may have
  been a long time ago, but I thought I remember seeing that performance data 
  was
  not yet implemented.
  
 
 That was then. This is now :)

Spifftacular!

  No we'd be using some flavor of SLES.
  
 
 Should work marvellously then.

Thanks as always for your help, Andreas.

Mark

--
What happens now with your Lotus Notes apps - do you make another costly 
upgrade, or settle for being marooned without product support? Time to move
off Lotus Notes and onto the cloud with Force.com, apps are easier to build,
use, and manage than apps on traditional platforms. Sign up for the Lotus 
Notes Migration Kit to learn more. http://p.sf.net/sfu/salesforce-d2d
___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null


Re: [Nagios-users] high latency

2010-12-06 Thread Andreas Ericsson
On 12/03/2010 07:59 PM, Daniel Wittenberg wrote:
 It appears that nagios spawns lots and lots of new procs for all the
 various tasks it does, check results and such.  I was curious, wouldn't
 a model more like Apache work better?  Something like, a queue for work,
 and have worker processes grab off that queue, run a bunch of different
 jobs, then die, rather than just performing one task?  That seems like
 it would still maintain stability and offer higher performance gains ?
 

It probably would, and it's on the roadmap to rewrite those parts of
Nagios to something similar to what you've described.

-- 
Andreas Ericsson   andreas.erics...@op5.se
OP5 AB www.op5.se
Tel: +46 8-230225  Fax: +46 8-230231

Considering the successes of the wars on alcohol, poverty, drugs and
terror, I think we should give some serious thought to declaring war
on peace.

--
What happens now with your Lotus Notes apps - do you make another costly 
upgrade, or settle for being marooned without product support? Time to move
off Lotus Notes and onto the cloud with Force.com, apps are easier to build,
use, and manage than apps on traditional platforms. Sign up for the Lotus 
Notes Migration Kit to learn more. http://p.sf.net/sfu/salesforce-d2d
___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null


Re: [Nagios-users] high latency

2010-12-06 Thread Andreas Ericsson
On 12/03/2010 08:14 PM, Frost, Mark {PBC} wrote:
 
 Can the use of dependencies also be the cause of increased latencies?
 

If they're very deep, it's possible. Otherwise it really shouldn't matter
all that much. It will ofcourse add *some* load, but it shouldn't be enough
to cause latency.

 I too struggle with them and I'm running on lightly-loaded physical hardware.
 We have 2 servers doing the checks sending back to a central server.  Both
 distributed nodes use ocsp/ochp, but they do nothing more than append results
 to a file (i.e. it exits quickly).  Results are handled outside of Nagios.
 

Try getting rid of the oc[sh]p commands and use Merlin or google for pnsca or
persistent nsca. There's one available from op5's repositories that may or may
not work, and there's one from somewhere else that they're apparently using to
great effect.

Even if it exits quickly, it's still executed serially, so checking halts a
small period of time for each and every check that runs.

 What's odd is that distserver 1 and distserver 2 are configured the same
 
 distserver1:
 Hosts Checked   675
 Services Checked:  4179
 Active Service Latency: 0.000 / 3.155 / 0.382 sec
 Active Service Execution Time:  0.000 / 60.038 / 0.145 sec
 
 distserver2:
 Hosts Checked:  261
 Services Checked:  4289
 Active Service Latency: 0.000 / 169.977 / 81.300 sec
 Active Service Execution Time:  0.000 / 15.270 / 0.211 sec
 
 yet as you can see, distserver2's latency is much higher and always has been.
 I tried turning off EPN yesterday on distserver2 and it had no discernable 
 effect.
 We added 400 new service checks yesterday on distserver2 (just more of the 
 same
 checks we already do but on 26 new hosts) and the latency went from 35 to 
 over 80.
 

What kind of checks are you running? Some plugins draw a lot of cpu.
Are any of the checks set to run in serial (grep for parallelize_check in your
objects.cache file).

What version of Nagios are you running?

 The checks we do are very different (Windows, Linux, Unix, many are 
 app-centric) so
 it's difficult to compare exactly what runs on distserver1 and distserver2, 
 but given
 the jump that was taken yesterday, I'm wondering if the fact that the type of 
 checks
 on these new hosts are all built on dependencies make me wonder if that 
 doesn't
 have something to do with it.  These hosts (Windows) have a basic check for 
 NRPE
 and all other checks on the host are dependent on the NRPE check succeeding.
 
 I have to move to all new Nagios servers very soon.  I'm interested in 
 Merlin, but
 given its non-production nature just yet, I'm hesitant to commit and I'm not 
 sure if
 it will help me here.
 

It's been running at our 400+ customers with very few problems for the past 
month.
0.9.1, released just yesterday, solves the known issues our customers have
encountered. You might want to take a look at it again. There are some issues on
FreeBSD though (was that you reporting them?). I just recently got a new laptop
with better support for running virtual systems, so I'm downloading a FreeBSD 
8.1
install dvd as we speak. Hopefully I'll have those issues sorted out before the
end of the week.

-- 
Andreas Ericsson   andreas.erics...@op5.se
OP5 AB www.op5.se
Tel: +46 8-230225  Fax: +46 8-230231

Considering the successes of the wars on alcohol, poverty, drugs and
terror, I think we should give some serious thought to declaring war
on peace.

--
What happens now with your Lotus Notes apps - do you make another costly 
upgrade, or settle for being marooned without product support? Time to move
off Lotus Notes and onto the cloud with Force.com, apps are easier to build,
use, and manage than apps on traditional platforms. Sign up for the Lotus 
Notes Migration Kit to learn more. http://p.sf.net/sfu/salesforce-d2d
___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null


Re: [Nagios-users] high latency

2010-12-04 Thread Daniel Wittenberg
I did some testing today with epn on and off and it didn't seem to make
any difference in our latency times.  Not overly scientific though, but
looked about the same running few hours each way.

Dan

-Original Message-
From: Max Schubert [mailto:m...@webwizarddesign.com] 
Sent: Friday, December 03, 2010 7:03 AM
To: Andreas Ericsson; Nagios Users List
Subject: Re: [Nagios-users] high latency

Latency increases much more quickly for us without epn as execution
times are noticably longer per check.

We use rhel 5.x, so the perl is 5.8.8.

We have semi dailoy updates to our pollers and with epn that means
cold restarts - memory leaks have not been noticable given that
scenrio, but on test hosts or hosts where we are doing burn ins it is
negligable enough that we can go for 2-3 days with no memory issues -
we always hit service latency thresholds first.

7 seconds is in general where we have to force a restart of our
pollers to prevent metric collection and snmp delta calculation
issues.

Max

On 12/3/10, Andreas Ericsson a...@op5.se wrote:
 On 12/03/2010 12:46 PM, Max Schubert wrote:
 I find it interesting that a number of users get performance
 improvements with embedded perl off - we lose 20-40% polling capacity
 perl poller with it off.


 How do you mean that you're losing capacity? Does latency start to
creep
 upwards or is load increasing?

 Out of interest; How much memory does epn leak nowadays, and which
perl
 version is it compiled against?

 --
 Andreas Ericsson   andreas.erics...@op5.se
 OP5 AB www.op5.se
 Tel: +46 8-230225  Fax: +46 8-230231

 Considering the successes of the wars on alcohol, poverty, drugs and
 terror, I think we should give some serious thought to declaring war
 on peace.



--
Increase Visibility of Your 3D Game App  Earn a Chance To Win $500!
Tap into the largest installed PC base  get more eyes on your game by
optimizing for Intel(R) Graphics Technology. Get started today with the
Intel(R) Software Partner Program. Five $500 cash prizes are up for
grabs.
http://p.sf.net/sfu/intelisp-dev2dev
___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when
reporting any issue. 
::: Messages without supporting info will risk being sent to /dev/null

--
What happens now with your Lotus Notes apps - do you make another costly 
upgrade, or settle for being marooned without product support? Time to move
off Lotus Notes and onto the cloud with Force.com, apps are easier to build,
use, and manage than apps on traditional platforms. Sign up for the Lotus 
Notes Migration Kit to learn more. http://p.sf.net/sfu/salesforce-d2d
___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null


Re: [Nagios-users] high latency

2010-12-03 Thread Andreas Ericsson
On 12/02/2010 06:42 PM, Daniel Wittenberg wrote:
 
 Embeded perl is interesting though, I hadn't tried that, thought it was
 supposed to help with performance.

In theory, it does. It probably does in practice too, but the problems
associated with it makes it not worth it.

  I don't think we have any obsessive
 stuff running right now.
 

Check if you're not sure.

 Right now hardware is 4 proc vmware esx, 4GB RAM.  For production there
 will be 12 of those boxes with the number of hosts being about 1200-1500
 per nagios server.
 

Virtual systems. Bleh. Anyways, if you're going to use a loadbalanced setup
you should look into using Merlin. That way you get complete failover for
free.

-- 
Andreas Ericsson   andreas.erics...@op5.se
OP5 AB www.op5.se
Tel: +46 8-230225  Fax: +46 8-230231

Considering the successes of the wars on alcohol, poverty, drugs and
terror, I think we should give some serious thought to declaring war
on peace.

--
Increase Visibility of Your 3D Game App  Earn a Chance To Win $500!
Tap into the largest installed PC base  get more eyes on your game by
optimizing for Intel(R) Graphics Technology. Get started today with the
Intel(R) Software Partner Program. Five $500 cash prizes are up for grabs.
http://p.sf.net/sfu/intelisp-dev2dev
___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null


Re: [Nagios-users] high latency

2010-12-03 Thread Andreas Ericsson
On 12/02/2010 08:38 PM, Daniel Wittenberg wrote:
 Someone else noticed that nagios is generating a ton of minor page
 faults, and curious if that's normal and if that could be causing some
 of the latency in the checks?

define a ton

$ /usr/bin/time php -r 'echo marsipulami\n;'
marsipulami
0.01user 0.01system 0:00.09elapsed 34%CPU (0avgtext+0avgdata 29104maxresident)k
10208inputs+0outputs (70major+1962minor)pagefaults 0swaps

That's with a reasonably simple program, and it generates 70 major and 1962
minor pagefaults.

  I've also got a tmpfs setup for the
 status.dat and the checkresults directory to ease some of the disk i/o
 since we're on a san-backed vm host.
 

That's good, although if you're using a virtual system you'll never know
for sure if you're really using a ramdisk or not, since the host system
might well use swap to store the ramdisk anyway.

 I turned off embedded perl this morning and our latency has been holding
 at  10 seconds so far, so that seemed to help a lot.
 

Neat. Did it affect your pagefaults? If so, how?

-- 
Andreas Ericsson   andreas.erics...@op5.se
OP5 AB www.op5.se
Tel: +46 8-230225  Fax: +46 8-230231

Considering the successes of the wars on alcohol, poverty, drugs and
terror, I think we should give some serious thought to declaring war
on peace.

--
Increase Visibility of Your 3D Game App  Earn a Chance To Win $500!
Tap into the largest installed PC base  get more eyes on your game by
optimizing for Intel(R) Graphics Technology. Get started today with the
Intel(R) Software Partner Program. Five $500 cash prizes are up for grabs.
http://p.sf.net/sfu/intelisp-dev2dev
___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null


Re: [Nagios-users] high latency

2010-12-03 Thread Max Schubert
I find it interesting that a number of users get performance
improvements with embedded perl off - we lose 20-40% polling capacity
perl poller with it off.

- Max

--
Increase Visibility of Your 3D Game App  Earn a Chance To Win $500!
Tap into the largest installed PC base  get more eyes on your game by
optimizing for Intel(R) Graphics Technology. Get started today with the
Intel(R) Software Partner Program. Five $500 cash prizes are up for grabs.
http://p.sf.net/sfu/intelisp-dev2dev
___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null


Re: [Nagios-users] high latency

2010-12-03 Thread Andreas Ericsson
On 12/03/2010 12:46 PM, Max Schubert wrote:
 I find it interesting that a number of users get performance
 improvements with embedded perl off - we lose 20-40% polling capacity
 perl poller with it off.
 

How do you mean that you're losing capacity? Does latency start to creep
upwards or is load increasing?

Out of interest; How much memory does epn leak nowadays, and which perl
version is it compiled against?

-- 
Andreas Ericsson   andreas.erics...@op5.se
OP5 AB www.op5.se
Tel: +46 8-230225  Fax: +46 8-230231

Considering the successes of the wars on alcohol, poverty, drugs and
terror, I think we should give some serious thought to declaring war
on peace.

--
Increase Visibility of Your 3D Game App  Earn a Chance To Win $500!
Tap into the largest installed PC base  get more eyes on your game by
optimizing for Intel(R) Graphics Technology. Get started today with the
Intel(R) Software Partner Program. Five $500 cash prizes are up for grabs.
http://p.sf.net/sfu/intelisp-dev2dev
___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null


Re: [Nagios-users] high latency

2010-12-03 Thread Max Schubert
Latency increases much more quickly for us without epn as execution
times are noticably longer per check.

We use rhel 5.x, so the perl is 5.8.8.

We have semi dailoy updates to our pollers and with epn that means
cold restarts - memory leaks have not been noticable given that
scenrio, but on test hosts or hosts where we are doing burn ins it is
negligable enough that we can go for 2-3 days with no memory issues -
we always hit service latency thresholds first.

7 seconds is in general where we have to force a restart of our
pollers to prevent metric collection and snmp delta calculation
issues.

Max

On 12/3/10, Andreas Ericsson a...@op5.se wrote:
 On 12/03/2010 12:46 PM, Max Schubert wrote:
 I find it interesting that a number of users get performance
 improvements with embedded perl off - we lose 20-40% polling capacity
 perl poller with it off.


 How do you mean that you're losing capacity? Does latency start to creep
 upwards or is load increasing?

 Out of interest; How much memory does epn leak nowadays, and which perl
 version is it compiled against?

 --
 Andreas Ericsson   andreas.erics...@op5.se
 OP5 AB www.op5.se
 Tel: +46 8-230225  Fax: +46 8-230231

 Considering the successes of the wars on alcohol, poverty, drugs and
 terror, I think we should give some serious thought to declaring war
 on peace.


--
Increase Visibility of Your 3D Game App  Earn a Chance To Win $500!
Tap into the largest installed PC base  get more eyes on your game by
optimizing for Intel(R) Graphics Technology. Get started today with the
Intel(R) Software Partner Program. Five $500 cash prizes are up for grabs.
http://p.sf.net/sfu/intelisp-dev2dev
___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null


Re: [Nagios-users] high latency

2010-12-03 Thread Daniel Wittenberg
Pagefaults - 20-30k.  This seems to be the source of most of the cpu
system time (understandably), which sits about 40-50%.  So if I could
reduce the pagefaults I think we could gain quite a bit of performance
back.

I found one other huge issue...somehow in the generic service check, the
check_inteval was set to 5 minutes...however, normal_check_interval
wasn't set at all and appeared to be checking every minute. I deleted
check_interval and added normal_check_interval and that helped a ton,
latency went down to 0.5-1.5 seconds.  That was only running 2 active
checks and about a dozen passive on 700 hosts.  I then added back in the
other 9 active checks and latency once again shot back up to about 2000
*sigh*.

I grabbed another vm and made it a dnx client and that seemed to help,
but wish I could get the main server to handle more.  Right now it has
about 700 hosts and 12,100 service checks, of which about 7000 are
active and rest are passive.

Oh, and we do have obsessive turned off.  I've even gone through as many
configs as I could and removed the macros too until I can write a
caching mech for the macro statements.

Any more ideas? 

-Original Message-
From: Andreas Ericsson [mailto:a...@op5.se] 
Sent: Friday, December 03, 2010 5:39 AM
To: Nagios Users List
Cc: Daniel Wittenberg
Subject: Re: [Nagios-users] high latency

On 12/02/2010 08:38 PM, Daniel Wittenberg wrote:
 Someone else noticed that nagios is generating a ton of minor page
 faults, and curious if that's normal and if that could be causing some
 of the latency in the checks?

define a ton

$ /usr/bin/time php -r 'echo marsipulami\n;'
marsipulami
0.01user 0.01system 0:00.09elapsed 34%CPU (0avgtext+0avgdata
29104maxresident)k
10208inputs+0outputs (70major+1962minor)pagefaults 0swaps

That's with a reasonably simple program, and it generates 70 major and
1962
minor pagefaults.

  I've also got a tmpfs setup for the
 status.dat and the checkresults directory to ease some of the disk i/o
 since we're on a san-backed vm host.
 

That's good, although if you're using a virtual system you'll never know
for sure if you're really using a ramdisk or not, since the host system
might well use swap to store the ramdisk anyway.

 I turned off embedded perl this morning and our latency has been
holding
 at  10 seconds so far, so that seemed to help a lot.
 

Neat. Did it affect your pagefaults? If so, how?

-- 
Andreas Ericsson   andreas.erics...@op5.se
OP5 AB www.op5.se
Tel: +46 8-230225  Fax: +46 8-230231

Considering the successes of the wars on alcohol, poverty, drugs and
terror, I think we should give some serious thought to declaring war
on peace.

--
Increase Visibility of Your 3D Game App  Earn a Chance To Win $500!
Tap into the largest installed PC base  get more eyes on your game by
optimizing for Intel(R) Graphics Technology. Get started today with the
Intel(R) Software Partner Program. Five $500 cash prizes are up for grabs.
http://p.sf.net/sfu/intelisp-dev2dev
___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null


Re: [Nagios-users] high latency

2010-12-03 Thread Andreas Ericsson
On 12/03/2010 04:31 PM, Daniel Wittenberg wrote:
 Pagefaults - 20-30k.  This seems to be the source of most of the cpu
 system time (understandably), which sits about 40-50%.  So if I could
 reduce the pagefaults I think we could gain quite a bit of performance
 back.
 

Over what period of time? Here's from a program running a mere 1.22s,
showing 13k pagefaults. The majority of that time is *not* spent trying
to load the swapped out mmap regions, but in delta chain lookups inside
the program logic. And so the output:

$ time git repack
Counting objects: 397, done.
Delta compression using up to 4 threads.
Compressing objects: 100% (397/397), done.
Writing objects: 100% (397/397), done.
Total 397 (delta 238), reused 0 (delta 0)
0.28user 0.09system 0:01.22elapsed 30%CPU (0avgtext+0avgdata 20544maxresident)k
6368inputs+464outputs (297major+12959minor)pagefaults 0swaps

I really think you're misunderstanding what pagefaults are and how they
work. Starting an X-server or openoffice.org is likely to generate somewhere
around a million pagefaults each, simply because they use a lot of libraries,
read a lot of config files, invoke a lot of helper programs and in attempt to
access various devices. 20-30k pagefaults is *nothing* for a cpu capable of
executing a couple of billion instructions per second.


 I found one other huge issue...somehow in the generic service check, the
 check_inteval was set to 5 minutes...however, normal_check_interval
 wasn't set at all and appeared to be checking every minute. I deleted
 check_interval and added normal_check_interval and that helped a ton,
 latency went down to 0.5-1.5 seconds.  That was only running 2 active
 checks and about a dozen passive on 700 hosts.  I then added back in the
 other 9 active checks and latency once again shot back up to about 2000
 *sigh*.
 

You're doing something weird. I'm 100% certain that this isn't Nagios'
fault. Any chance you could share your config off-list? Remove passwords
and addresses first if you like.

 I grabbed another vm and made it a dnx client and that seemed to help,
 but wish I could get the main server to handle more.  Right now it has
 about 700 hosts and 12,100 service checks, of which about 7000 are
 active and rest are passive.
 

Umm... First you said you added 9 checks and that made the entire thing
just blow up, and now you're running 7000 active checks. What checks are
you running? If you sort by cpu usage in top, is there anyone that's
really prominent?

-- 
Andreas Ericsson   andreas.erics...@op5.se
OP5 AB www.op5.se
Tel: +46 8-230225  Fax: +46 8-230231

Considering the successes of the wars on alcohol, poverty, drugs and
terror, I think we should give some serious thought to declaring war
on peace.

--
Increase Visibility of Your 3D Game App  Earn a Chance To Win $500!
Tap into the largest installed PC base  get more eyes on your game by
optimizing for Intel(R) Graphics Technology. Get started today with the
Intel(R) Software Partner Program. Five $500 cash prizes are up for grabs.
http://p.sf.net/sfu/intelisp-dev2dev
___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null


Re: [Nagios-users] high latency

2010-12-03 Thread Daniel Wittenberg
Sorry for confusion on that..I added 9 checks to *each* host, and
there's about 700 hosts.  No, it's all the nagios daemon itself (nagios
-uxd).  It feels like if I add that many more checks that it has a hard
time doing the checks and processing the results since if I either move
the active checking to dnx or drop them completely the load and latency
times drop.  

Dan

 I grabbed another vm and made it a dnx client and that seemed to help,
 but wish I could get the main server to handle more.  Right now it has
 about 700 hosts and 12,100 service checks, of which about 7000 are
 active and rest are passive.
 

Umm... First you said you added 9 checks and that made the entire thing
just blow up, and now you're running 7000 active checks. What checks are
you running? If you sort by cpu usage in top, is there anyone that's
really prominent?

-- 
Andreas Ericsson   andreas.erics...@op5.se
OP5 AB www.op5.se
Tel: +46 8-230225  Fax: +46 8-230231

Considering the successes of the wars on alcohol, poverty, drugs and
terror, I think we should give some serious thought to declaring war
on peace.

--
Increase Visibility of Your 3D Game App  Earn a Chance To Win $500!
Tap into the largest installed PC base  get more eyes on your game by
optimizing for Intel(R) Graphics Technology. Get started today with the
Intel(R) Software Partner Program. Five $500 cash prizes are up for grabs.
http://p.sf.net/sfu/intelisp-dev2dev
___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null


Re: [Nagios-users] high latency

2010-12-03 Thread Daniel Wittenberg
It appears that nagios spawns lots and lots of new procs for all the
various tasks it does, check results and such.  I was curious, wouldn't
a model more like Apache work better?  Something like, a queue for work,
and have worker processes grab off that queue, run a bunch of different
jobs, then die, rather than just performing one task?  That seems like
it would still maintain stability and offer higher performance gains ?

Dan

-Original Message-
From: Andreas Ericsson [mailto:a...@op5.se] 
Sent: Friday, December 03, 2010 5:22 AM
To: Daniel Wittenberg
Cc: Nagios Users List
Subject: Re: [Nagios-users] high latency

On 12/02/2010 06:42 PM, Daniel Wittenberg wrote:
 
 Embeded perl is interesting though, I hadn't tried that, thought it
was
 supposed to help with performance.

In theory, it does. It probably does in practice too, but the problems
associated with it makes it not worth it.

  I don't think we have any obsessive
 stuff running right now.
 

Check if you're not sure.

 Right now hardware is 4 proc vmware esx, 4GB RAM.  For production
there
 will be 12 of those boxes with the number of hosts being about
1200-1500
 per nagios server.
 

Virtual systems. Bleh. Anyways, if you're going to use a loadbalanced
setup
you should look into using Merlin. That way you get complete failover
for
free.

-- 
Andreas Ericsson   andreas.erics...@op5.se
OP5 AB www.op5.se
Tel: +46 8-230225  Fax: +46 8-230231

Considering the successes of the wars on alcohol, poverty, drugs and
terror, I think we should give some serious thought to declaring war
on peace.

--
Increase Visibility of Your 3D Game App  Earn a Chance To Win $500!
Tap into the largest installed PC base  get more eyes on your game by
optimizing for Intel(R) Graphics Technology. Get started today with the
Intel(R) Software Partner Program. Five $500 cash prizes are up for grabs.
http://p.sf.net/sfu/intelisp-dev2dev
___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null


Re: [Nagios-users] high latency

2010-12-03 Thread Frost, Mark {PBC}

Can the use of dependencies also be the cause of increased latencies?

I too struggle with them and I'm running on lightly-loaded physical hardware.
We have 2 servers doing the checks sending back to a central server.  Both
distributed nodes use ocsp/ochp, but they do nothing more than append results
to a file (i.e. it exits quickly).  Results are handled outside of Nagios.

What's odd is that distserver 1 and distserver 2 are configured the same

distserver1:
Hosts Checked   675
Services Checked:  4179
Active Service Latency: 0.000 / 3.155 / 0.382 sec
Active Service Execution Time:  0.000 / 60.038 / 0.145 sec

distserver2:
Hosts Checked:  261
Services Checked:  4289
Active Service Latency: 0.000 / 169.977 / 81.300 sec
Active Service Execution Time:  0.000 / 15.270 / 0.211 sec

yet as you can see, distserver2's latency is much higher and always has been.
I tried turning off EPN yesterday on distserver2 and it had no discernable 
effect.
We added 400 new service checks yesterday on distserver2 (just more of the same
checks we already do but on 26 new hosts) and the latency went from 35 to over 
80.

The checks we do are very different (Windows, Linux, Unix, many are 
app-centric) so
it's difficult to compare exactly what runs on distserver1 and distserver2, but 
given
the jump that was taken yesterday, I'm wondering if the fact that the type of 
checks
on these new hosts are all built on dependencies make me wonder if that doesn't
have something to do with it.  These hosts (Windows) have a basic check for NRPE
and all other checks on the host are dependent on the NRPE check succeeding.

I have to move to all new Nagios servers very soon.  I'm interested in Merlin, 
but
given its non-production nature just yet, I'm hesitant to commit and I'm not 
sure if
it will help me here.

Thanks

Mark

--
Increase Visibility of Your 3D Game App  Earn a Chance To Win $500!
Tap into the largest installed PC base  get more eyes on your game by
optimizing for Intel(R) Graphics Technology. Get started today with the
Intel(R) Software Partner Program. Five $500 cash prizes are up for grabs.
http://p.sf.net/sfu/intelisp-dev2dev
___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null


Re: [Nagios-users] high latency

2010-12-02 Thread Daniel Wittenberg
Yeah, for giggles I went back further through the archives last night
and found stuff back to 2.x series, and not much has seemed to help.  I
killed some of my mis-behaving active checks, and that dropped to about
20 seconds, then went up to about 35-50.  So while that's better, I have
A LOT more hosts and service checks to add, and am afraid it'll go nuts
when I dump more on.  I think I've tried about all the config options I
could find and some helped, some didn't seem to, but  there should be
plenty of horsepower on the machine to run this much faster so not sure
why it's not.

 

Dan

 

From: Assaf Flatto [mailto:nag...@flatto.net] 
Sent: Wednesday, December 01, 2010 11:26 AM
To: Nagios Users List
Cc: Daniel Wittenberg
Subject: Re: [Nagios-users] high latency

 

dan 

there were a couple of discussions on the list that dealt with latency
issues .

Have you tried looking at the list archives about the topic ?

Assaf


On 01/12/10 16:00, Daniel Wittenberg wrote: 

I've been watching my latency graphs, and showing 2000 seconds for some
service and host checks.  What I don't understand is I still have idle
time on the CPU, (quad processor) so I'm curious if the server isn't in
trouble, why am I seeing such high latency?  Or maybe I misunderstand
how latency is calculated?  I do have 9 service checks that are failing
on about 700 hosts if that matters at all.  Trying to tweak the
performance to the max on this so any insight welcome.

 

Thanks,

Dan

 





-- 
Never,Ever Cut A Deal With a Dragon 
 
 
Next year I will be doing the London to Paris bike ride to 
raise money for the DogTrust (www.dogstrust.co.uk) .
Please Sponsor me at http://www.justgiving.com/Assaf-Flatto
--
Increase Visibility of Your 3D Game App  Earn a Chance To Win $500!
Tap into the largest installed PC base  get more eyes on your game by
optimizing for Intel(R) Graphics Technology. Get started today with the
Intel(R) Software Partner Program. Five $500 cash prizes are up for grabs.
http://p.sf.net/sfu/intelisp-dev2dev___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null

Re: [Nagios-users] high latency

2010-12-02 Thread C. Bensend

 Yeah, for giggles I went back further through the archives last night
 and found stuff back to 2.x series, and not much has seemed to help.  I
 killed some of my mis-behaving active checks, and that dropped to about
 20 seconds, then went up to about 35-50.  So while that's better, I have
 A LOT more hosts and service checks to add, and am afraid it'll go nuts
 when I dump more on.  I think I've tried about all the config options I
 could find and some helped, some didn't seem to, but  there should be
 plenty of horsepower on the machine to run this much faster so not sure
 why it's not.

Hey Dan,

   I too have been wrestling alligators with service and host
check latencies averaging around 60s, and increasing to 100+
(sometimes to 300) after a few reloads during the day.

   This morning, I enabled the use_large_installation_tweaks
option.  As of a minute ago, my host check latency is now
averaging 2.116s, and service check latency is averaging 0.748s.

   I didn't see if you had tried this yet, it might be something
to consider.

Benny


-- 
No matter how many shorts we have in the system, my guards will
be instructed to treat every surveillance camera malfunction as a
full-scale emergency.
   -- Peter Anspach's Evil Overlord List, #67



--
Increase Visibility of Your 3D Game App  Earn a Chance To Win $500!
Tap into the largest installed PC base  get more eyes on your game by
optimizing for Intel(R) Graphics Technology. Get started today with the
Intel(R) Software Partner Program. Five $500 cash prizes are up for grabs.
http://p.sf.net/sfu/intelisp-dev2dev
___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null


Re: [Nagios-users] high latency

2010-12-02 Thread Daniel Wittenberg
Yeah, been running that since day one, since when rollout is done we'll
probably have about 18k servers and around 3 million service checks...

I can probably post my relevant config options if someone wants to peak.

Dan

-Original Message-
From: C. Bensend [mailto:be...@bennyvision.com] 
Sent: Thursday, December 02, 2010 10:46 AM
To: nagios-users@lists.sourceforge.net
Subject: Re: [Nagios-users] high latency


 Yeah, for giggles I went back further through the archives last night
 and found stuff back to 2.x series, and not much has seemed to help.
I
 killed some of my mis-behaving active checks, and that dropped to
about
 20 seconds, then went up to about 35-50.  So while that's better, I
have
 A LOT more hosts and service checks to add, and am afraid it'll go
nuts
 when I dump more on.  I think I've tried about all the config options
I
 could find and some helped, some didn't seem to, but  there should be
 plenty of horsepower on the machine to run this much faster so not
sure
 why it's not.

Hey Dan,

   I too have been wrestling alligators with service and host
check latencies averaging around 60s, and increasing to 100+
(sometimes to 300) after a few reloads during the day.

   This morning, I enabled the use_large_installation_tweaks
option.  As of a minute ago, my host check latency is now
averaging 2.116s, and service check latency is averaging 0.748s.

   I didn't see if you had tried this yet, it might be something
to consider.

Benny


-- 
No matter how many shorts we have in the system, my guards will
be instructed to treat every surveillance camera malfunction as a
full-scale emergency.
   -- Peter Anspach's Evil Overlord List, #67




--
Increase Visibility of Your 3D Game App  Earn a Chance To Win $500!
Tap into the largest installed PC base  get more eyes on your game by
optimizing for Intel(R) Graphics Technology. Get started today with the
Intel(R) Software Partner Program. Five $500 cash prizes are up for
grabs.
http://p.sf.net/sfu/intelisp-dev2dev
___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when
reporting any issue. 
::: Messages without supporting info will risk being sent to /dev/null

--
Increase Visibility of Your 3D Game App  Earn a Chance To Win $500!
Tap into the largest installed PC base  get more eyes on your game by
optimizing for Intel(R) Graphics Technology. Get started today with the
Intel(R) Software Partner Program. Five $500 cash prizes are up for grabs.
http://p.sf.net/sfu/intelisp-dev2dev
___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null


Re: [Nagios-users] high latency

2010-12-02 Thread Andreas Ericsson
On 12/02/2010 04:59 PM, Daniel Wittenberg wrote:
 Yeah, for giggles I went back further through the archives last night
 and found stuff back to 2.x series, and not much has seemed to help.  I
 killed some of my mis-behaving active checks, and that dropped to about
 20 seconds, then went up to about 35-50.  So while that's better, I have
 A LOT more hosts and service checks to add, and am afraid it'll go nuts
 when I dump more on.  I think I've tried about all the config options I
 could find and some helped, some didn't seem to, but  there should be
 plenty of horsepower on the machine to run this much faster so not sure
 why it's not.
 
 
 
 Dan
 
 
 
 From: Assaf Flatto [mailto:nag...@flatto.net]
 Sent: Wednesday, December 01, 2010 11:26 AM
 To: Nagios Users List
 Cc: Daniel Wittenberg
 Subject: Re: [Nagios-users] high latency
 
 
 
 dan
 
 there were a couple of discussions on the list that dealt with latency
 issues .
 
 Have you tried looking at the list archives about the topic ?
 
 Assaf
 
 
 On 01/12/10 16:00, Daniel Wittenberg wrote:
 
 I've been watching my latency graphs, and showing 2000 seconds for some
 service and host checks.  What I don't understand is I still have idle
 time on the CPU, (quad processor) so I'm curious if the server isn't in
 trouble, why am I seeing such high latency?  Or maybe I misunderstand
 how latency is calculated?  I do have 9 service checks that are failing
 on about 700 hosts if that matters at all.  Trying to tweak the
 performance to the max on this so any insight welcome.
 

Ditch your performance-data processing and see if that helps. You might
also want to get rid of embedded perl. It's been known to cause really
weird errors (although primarily memory leaks).

You'll also want to get rid of obsessive host and service commands.

How large is your installation and what hardware and system are you
running it on?

-- 
Andreas Ericsson   andreas.erics...@op5.se
OP5 AB www.op5.se
Tel: +46 8-230225  Fax: +46 8-230231

Considering the successes of the wars on alcohol, poverty, drugs and
terror, I think we should give some serious thought to declaring war
on peace.

--
Increase Visibility of Your 3D Game App  Earn a Chance To Win $500!
Tap into the largest installed PC base  get more eyes on your game by
optimizing for Intel(R) Graphics Technology. Get started today with the
Intel(R) Software Partner Program. Five $500 cash prizes are up for grabs.
http://p.sf.net/sfu/intelisp-dev2dev
___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null


Re: [Nagios-users] high latency

2010-12-02 Thread Andreas Ericsson
On 12/02/2010 06:05 PM, Daniel Wittenberg wrote:
 Yeah, been running that since day one, since when rollout is done we'll
 probably have about 18k servers and around 3 million service checks...
 

170 services per host? Sounds like an awful lot of switches. I'd use
some cleverness to grab snmp-info once and parse the data afterwards
if I were you.

For that kind of installation, you'll need to use a distributed setup
of some sort. merlin, dnx and apparently mod-gearman should get you
going in the right direction.

-- 
Andreas Ericsson   andreas.erics...@op5.se
OP5 AB www.op5.se
Tel: +46 8-230225  Fax: +46 8-230231

Considering the successes of the wars on alcohol, poverty, drugs and
terror, I think we should give some serious thought to declaring war
on peace.

--
Increase Visibility of Your 3D Game App  Earn a Chance To Win $500!
Tap into the largest installed PC base  get more eyes on your game by
optimizing for Intel(R) Graphics Technology. Get started today with the
Intel(R) Software Partner Program. Five $500 cash prizes are up for grabs.
http://p.sf.net/sfu/intelisp-dev2dev
___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null


Re: [Nagios-users] high latency

2010-12-02 Thread Daniel Wittenberg
Not using SNMP for any of the checks, and most are passive checks.  For
the few active checks we are probably going to be using dnx.  

Embeded perl is interesting though, I hadn't tried that, thought it was
supposed to help with performance.  I don't think we have any obsessive
stuff running right now.

Right now hardware is 4 proc vmware esx, 4GB RAM.  For production there
will be 12 of those boxes with the number of hosts being about 1200-1500
per nagios server.

Dan

-Original Message-
From: Andreas Ericsson [mailto:a...@op5.se] 
Sent: Thursday, December 02, 2010 11:19 AM
To: Nagios Users List
Cc: Daniel Wittenberg
Subject: Re: [Nagios-users] high latency

On 12/02/2010 06:05 PM, Daniel Wittenberg wrote:
 Yeah, been running that since day one, since when rollout is done
we'll
 probably have about 18k servers and around 3 million service checks...
 

170 services per host? Sounds like an awful lot of switches. I'd use
some cleverness to grab snmp-info once and parse the data afterwards
if I were you.

For that kind of installation, you'll need to use a distributed setup
of some sort. merlin, dnx and apparently mod-gearman should get you
going in the right direction.

-- 
Andreas Ericsson   andreas.erics...@op5.se
OP5 AB www.op5.se
Tel: +46 8-230225  Fax: +46 8-230231

Considering the successes of the wars on alcohol, poverty, drugs and
terror, I think we should give some serious thought to declaring war
on peace.

--
Increase Visibility of Your 3D Game App  Earn a Chance To Win $500!
Tap into the largest installed PC base  get more eyes on your game by
optimizing for Intel(R) Graphics Technology. Get started today with the
Intel(R) Software Partner Program. Five $500 cash prizes are up for grabs.
http://p.sf.net/sfu/intelisp-dev2dev
___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null


Re: [Nagios-users] high latency

2010-12-02 Thread Daniel Wittenberg
Someone else noticed that nagios is generating a ton of minor page
faults, and curious if that's normal and if that could be causing some
of the latency in the checks?  I've also got a tmpfs setup for the
status.dat and the checkresults directory to ease some of the disk i/o
since we're on a san-backed vm host.

I turned off embedded perl this morning and our latency has been holding
at  10 seconds so far, so that seemed to help a lot.

Dan

-Original Message-
From: Daniel Wittenberg [mailto:daniel.wittenberg.r...@statefarm.com] 
Sent: Thursday, December 02, 2010 11:42 AM
To: Andreas Ericsson; Nagios Users List
Subject: Re: [Nagios-users] high latency

Not using SNMP for any of the checks, and most are passive checks.  For
the few active checks we are probably going to be using dnx.  

Embeded perl is interesting though, I hadn't tried that, thought it was
supposed to help with performance.  I don't think we have any obsessive
stuff running right now.

Right now hardware is 4 proc vmware esx, 4GB RAM.  For production there
will be 12 of those boxes with the number of hosts being about 1200-1500
per nagios server.

Dan

-Original Message-
From: Andreas Ericsson [mailto:a...@op5.se] 
Sent: Thursday, December 02, 2010 11:19 AM
To: Nagios Users List
Cc: Daniel Wittenberg
Subject: Re: [Nagios-users] high latency

On 12/02/2010 06:05 PM, Daniel Wittenberg wrote:
 Yeah, been running that since day one, since when rollout is done
we'll
 probably have about 18k servers and around 3 million service checks...
 

170 services per host? Sounds like an awful lot of switches. I'd use
some cleverness to grab snmp-info once and parse the data afterwards
if I were you.

For that kind of installation, you'll need to use a distributed setup
of some sort. merlin, dnx and apparently mod-gearman should get you
going in the right direction.

-- 
Andreas Ericsson   andreas.erics...@op5.se
OP5 AB www.op5.se
Tel: +46 8-230225  Fax: +46 8-230231

Considering the successes of the wars on alcohol, poverty, drugs and
terror, I think we should give some serious thought to declaring war
on peace.


--
Increase Visibility of Your 3D Game App  Earn a Chance To Win $500!
Tap into the largest installed PC base  get more eyes on your game by
optimizing for Intel(R) Graphics Technology. Get started today with the
Intel(R) Software Partner Program. Five $500 cash prizes are up for
grabs.
http://p.sf.net/sfu/intelisp-dev2dev
___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when
reporting any issue. 
::: Messages without supporting info will risk being sent to /dev/null

--
Increase Visibility of Your 3D Game App  Earn a Chance To Win $500!
Tap into the largest installed PC base  get more eyes on your game by
optimizing for Intel(R) Graphics Technology. Get started today with the
Intel(R) Software Partner Program. Five $500 cash prizes are up for grabs.
http://p.sf.net/sfu/intelisp-dev2dev
___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null


Re: [Nagios-users] high latency

2010-12-01 Thread Assaf Flatto

 dan

there were a couple of discussions on the list that dealt with latency 
issues .


Have you tried looking at the list archives about the topic ?

Assaf


On 01/12/10 16:00, Daniel Wittenberg wrote:


I've been watching my latency graphs, and showing 2000 seconds for 
some service and host checks.  What I don't understand is I still have 
idle time on the CPU, (quad processor) so I'm curious if the server 
isn't in trouble, why am I seeing such high latency?  Or maybe I 
misunderstand how latency is calculated?  I do have 9 service checks 
that are failing on about 700 hosts if that matters at all.  Trying to 
tweak the performance to the max on this so any insight welcome.



Thanks,

Dan




--
Never,Ever Cut A Deal With a Dragon


Next year I will be doing the London to Paris bike ride to
raise money for the DogTrust (www.dogstrust.co.uk) .
Please Sponsor me at http://www.justgiving.com/Assaf-Flatto

--
Increase Visibility of Your 3D Game App  Earn a Chance To Win $500!
Tap into the largest installed PC base  get more eyes on your game by
optimizing for Intel(R) Graphics Technology. Get started today with the
Intel(R) Software Partner Program. Five $500 cash prizes are up for grabs.
http://p.sf.net/sfu/intelisp-dev2dev___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null

Re: [Nagios-users] High latency on small installation

2010-07-21 Thread Marc Powell

On Jul 20, 2010, at 11:36 AM, Assaf Flatto wrote:

 Hello All
 
 
 I am having a problem with very high latency on my main nagios server 
 (3.2.0 from source on SLES 10.3 x64).
 I recompiled the core with the  embedded perl and that helped for a 
 while to lower the latency but it keeps growing to times that are not 
 reasonable for this size of a nagios installation .

 event_broker_options=-1
 broker_module=/usr/local/nagios/bin/ndomod-3x.o 

Is it better if you disable the event broker? If so, search the archives for 
information about it and database tuning. There has been somewhat recent 
discussion about higher latency as the database grows in size.

 process_performance_data=1
 host_perfdata_command=process-host-perfdata
 service_perfdata_command=process-service-perfdata

Is it better if you disable this? If so, see if there's any performance tuning 
information for the addon you are using.

 enable_environment_macros=1

Disable this if you are not explicitly using it. Chances are very high that you 
are not.

--
Marc
--
This SF.net email is sponsored by Sprint
What will you do first with EVO, the first 4G phone?
Visit sprint.com/first -- http://p.sf.net/sfu/sprint-com-first
___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null


Re: [Nagios-users] High latency on small installation

2010-07-21 Thread Assaf Flatto

 event_broker_options=-1
 broker_module=/usr/local/nagios/bin/ndomod-3x.o 
 

 Is it better if you disable the event broker? If so, search the archives for 
 information about it and database tuning. There has been somewhat recent 
 discussion about higher latency as the database grows in size.

   
I was part of that thread , and i can not remove it since  we need it 
for the nagviz .
 process_performance_data=1
 host_perfdata_command=process-host-perfdata
 service_perfdata_command=process-service-perfdata
 

 Is it better if you disable this? If so, see if there's any performance 
 tuning information for the addon you are using.

   
Again - needed for pnp4nagios  we use for our graphs.
 enable_environment_macros=1
 

 Disable this if you are not explicitly using it. Chances are very high that 
 you are not.
   
That hit the spot right one .
Active Service Latency: 5.034 / 462.816 / 197.907 sec
 and dropping

Thanks Marc.

-- 
Never,Ever Cut A Deal With a Dragon 




--
This SF.net email is sponsored by Sprint
What will you do first with EVO, the first 4G phone?
Visit sprint.com/first -- http://p.sf.net/sfu/sprint-com-first
___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null


Re: [Nagios-users] High latency on small installation

2010-07-21 Thread Assaf Flatto
Assaf Flatto wrote:
 event_broker_options=-1
 broker_module=/usr/local/nagios/bin/ndomod-3x.o 
 
   
 Is it better if you disable the event broker? If so, search the archives for 
 information about it and database tuning. There has been somewhat recent 
 discussion about higher latency as the database grows in size.

   
 
 I was part of that thread , and i can not remove it since  we need it 
 for the nagviz .
   
 process_performance_data=1
 host_perfdata_command=process-host-perfdata
 service_perfdata_command=process-service-perfdata
 
   
 Is it better if you disable this? If so, see if there's any performance 
 tuning information for the addon you are using.

   
 
 Again - needed for pnp4nagios  we use for our graphs.
   
 enable_environment_macros=1
 
   
 Disable this if you are not explicitly using it. Chances are very high that 
 you are not.
   
 
 That hit the spot right one .
 Active Service Latency: 5.034 / 462.816 / 197.907 sec
  and dropping
   
 Thanks Marc.

   

Guess my joy was too pre mature , after the change it dropped all the 
way to 170 sec and then started climbing back up , not again it stands on

Active Service Latency: 5.679 / 441.738 / 384.102 sec

I have removed the ndo broker  and that helped by lowering the latency to
Active Service Latency:0.129 / 441.738 / 268.447 sec

It does mean i will lose the nagviz and the nagiosBP plugin .

but for now it will have to do

-- 
Never,Ever Cut A Deal With a Dragon 




--
This SF.net email is sponsored by Sprint
What will you do first with EVO, the first 4G phone?
Visit sprint.com/first -- http://p.sf.net/sfu/sprint-com-first
___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null


Re: [Nagios-users] High latency on small installation

2010-07-21 Thread Assaf Flatto
Could it be that the more service check I move off the main Nagios 
server an on to the tested hosts to run via NRPE , that the Latency will 
increase ?

Assaf


Marc Powell wrote:
 On Jul 20, 2010, at 11:36 AM, Assaf Flatto wrote:

   
 Hello All


 I am having a problem with very high latency on my main nagios server 
 (3.2.0 from source on SLES 10.3 x64).
 I recompiled the core with the  embedded perl and that helped for a 
 while to lower the latency but it keeps growing to times that are not 
 reasonable for this size of a nagios installation .
 

   
 event_broker_options=-1
 broker_module=/usr/local/nagios/bin/ndomod-3x.o 
 

 Is it better if you disable the event broker? If so, search the archives for 
 information about it and database tuning. There has been somewhat recent 
 discussion about higher latency as the database grows in size.

   
 process_performance_data=1
 host_perfdata_command=process-host-perfdata
 service_perfdata_command=process-service-perfdata
 

 Is it better if you disable this? If so, see if there's any performance 
 tuning information for the addon you are using.

   
 enable_environment_macros=1
 

 Disable this if you are not explicitly using it. Chances are very high that 
 you are not.

 --
   


-- 
Never,Ever Cut A Deal With a Dragon 




--
This SF.net email is sponsored by Sprint
What will you do first with EVO, the first 4G phone?
Visit sprint.com/first -- http://p.sf.net/sfu/sprint-com-first
___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null