Re: [Nagios-users-br] Monitorar tomcat/JVM

2010-06-10 Thread Marcel
tomcat   18085 1  0 Jun09 ?00:11:26   /usr/java/default/bin/java
-server -(...)* -Dcom.sun.management.jmxremote
-Dcom.sun.management.jmxremote.port=8998
-Dcom.sun.management.jmxremote.ssl=false
-Dcom.sun.management.jmxremote.authenticate=false*

trecho de um tomcat rodando com jmx abilitado

jmx não é exclusivo do container, é uma característica da JVM, então todos
os servlet containers por aí possuem a capacidade de instrumentação jmx, até
o jetty da microsoft.

[]s

2010/6/9 Jose Oliveira jotag...@gmail.com

 Marcel

 JMX é só pro JBOSS né não? serve pro tomcat tambem?



 Em 9 de junho de 2010 15:30, Marcel mits...@gmail.com escreveu:

  Yo,
 
  Pra mim o melhor jeito de monitorar esses recursos é via JMX.
 
  Tem um plugin no nagiosexchange que faz tudo isso!
 
  []s
 
  2010/6/9 Renato T Melo tamie...@gmail.com
 
   Ola pessoal,
  
   preciso monitorar o tomcat nos seguintes aspectos:
  
   1- Utilizacao (%) de memória
   2- Utilizacao (%) do JVM (Java Virtual Machine)
   3- Conexoes tomcat
   4- Timeout conexões
   5- Threads apache/tomcat
   6- Numero de conexoes com o banco em uso
  
   Acontece que não tenho muita intimidade com toda essa parafernalia
   (tomcat/jvm/jboss) !
   Alguem da lista jah passou por essa experiencia e pode nos ajudar?
  
   Obrigado.
   Renato.
  
  
  
 
 --
   ThinkGeek and WIRED's GeekDad team up for the Ultimate
   GeekDad Father's Day Giveaway. ONE MASSIVE PRIZE to the
   lucky parental unit.  See the prize list and enter to win:
   http://p.sf.net/sfu/thinkgeek-promo
   --
   Nagios-users-br@lists.sourceforge.net mailing list
   https://lists.sourceforge.net/lists/listinfo/nagios-users-br
   Wiki: http://nagios-br.sf.net/wiki
  
 
 
 --
  ThinkGeek and WIRED's GeekDad team up for the Ultimate
  GeekDad Father's Day Giveaway. ONE MASSIVE PRIZE to the
  lucky parental unit.  See the prize list and enter to win:
  http://p.sf.net/sfu/thinkgeek-promo
  --
  Nagios-users-br@lists.sourceforge.net mailing list
  https://lists.sourceforge.net/lists/listinfo/nagios-users-br
  Wiki: http://nagios-br.sf.net/wiki
 



 --
 Abraços
 JGeraldo

 --
 ThinkGeek and WIRED's GeekDad team up for the Ultimate
 GeekDad Father's Day Giveaway. ONE MASSIVE PRIZE to the
 lucky parental unit.  See the prize list and enter to win:
 http://p.sf.net/sfu/thinkgeek-promo
 --
 Nagios-users-br@lists.sourceforge.net mailing list
 https://lists.sourceforge.net/lists/listinfo/nagios-users-br
 Wiki: http://nagios-br.sf.net/wiki

--
ThinkGeek and WIRED's GeekDad team up for the Ultimate 
GeekDad Father's Day Giveaway. ONE MASSIVE PRIZE to the 
lucky parental unit.  See the prize list and enter to win: 
http://p.sf.net/sfu/thinkgeek-promo
-- 
Nagios-users-br@lists.sourceforge.net mailing list
https://lists.sourceforge.net/lists/listinfo/nagios-users-br
Wiki: http://nagios-br.sf.net/wiki


[Nagios-users] Query on check_http timeout option

2010-06-10 Thread Sharad Ganapathy
Hello,

I want to understand the timeout option in check_http. From the help option,
it states :
-t, --timeout=INTEGER
Seconds before connection times out (default: 10)

I monitor a webservice to check for connectivity and also pass some
parameters to get some content back. Usually the download transfer takes
around 20 seconds and occasionally it takes well over a minute or two. I
have configured my check in this fashion.

$ROOT/libexec/nagios/check_http -u 'some URL' -I API Hostname  -t 10 -c 20
-p 4080 -A 'Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.0; pagechecks
mon)'

I am seeing the check error out with the message 'Socket timeout after 10
seconds' quite often.

When I manually connect ( telnet/curl) the URL the connection time is well
below 10 seconds.

Is the timeout parameter used to check the time it takes to establish a TCP
connection or to govern the time the check  took  to complete ?


Thanks

Sharad
--
ThinkGeek and WIRED's GeekDad team up for the Ultimate 
GeekDad Father's Day Giveaway. ONE MASSIVE PRIZE to the 
lucky parental unit.  See the prize list and enter to win: 
http://p.sf.net/sfu/thinkgeek-promo___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null

Re: [Nagios-users] Query on check_http timeout option

2010-06-10 Thread Daniel Lockard
I believe it is the time the check_http took to connect, and download
the page.
Is the total time when you manually connect using curl greater than 10 seconds?


Daniel H Lockard



On Thu, Jun 10, 2010 at 1:07 AM, Sharad Ganapathy sharadg...@gmail.com wrote:
 Hello,
 I want to understand the timeout option in check_http. From the help option,
 it states :
 -t, --timeout=INTEGER
     Seconds before connection times out (default: 10)
 I monitor a webservice to check for connectivity and also pass some
 parameters to get some content back. Usually the download transfer takes
 around 20 seconds and occasionally it takes well over a minute or two. I
 have configured my check in this fashion.
 $ROOT/libexec/nagios/check_http -u 'some URL' -I API Hostname  -t 10 -c 20
 -p 4080 -A 'Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.0; pagechecks
 mon)'
 I am seeing the check error out with the message 'Socket timeout after 10
 seconds' quite often.
 When I manually connect ( telnet/curl) the URL the connection time is well
 below 10 seconds.
 Is the timeout parameter used to check the time it takes to establish a TCP
 connection or to govern the time the check  took  to complete ?

 Thanks
 Sharad
 --
 ThinkGeek and WIRED's GeekDad team up for the Ultimate
 GeekDad Father's Day Giveaway. ONE MASSIVE PRIZE to the
 lucky parental unit.  See the prize list and enter to win:
 http://p.sf.net/sfu/thinkgeek-promo
 ___
 Nagios-users mailing list
 Nagios-users@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/nagios-users
 ::: Please include Nagios version, plugin version (-v) and OS when reporting
 any issue.
 ::: Messages without supporting info will risk being sent to /dev/null


--
ThinkGeek and WIRED's GeekDad team up for the Ultimate 
GeekDad Father's Day Giveaway. ONE MASSIVE PRIZE to the 
lucky parental unit.  See the prize list and enter to win: 
http://p.sf.net/sfu/thinkgeek-promo
___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null


Re: [Nagios-users] Query on check_http timeout option

2010-06-10 Thread Sharad Ganapathy
Daniel Lockard wrote:
 I believe it is the time the check_http took to connect, and download
 the page.
 Is the total time when you manually connect using curl greater than 10 
 seconds?


 Daniel H Lockard




Yes. Sometimes the total time ( time to connect + download the content) 
goes upto 1 minute.


Sharad


--
ThinkGeek and WIRED's GeekDad team up for the Ultimate 
GeekDad Father's Day Giveaway. ONE MASSIVE PRIZE to the 
lucky parental unit.  See the prize list and enter to win: 
http://p.sf.net/sfu/thinkgeek-promo
___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null


Re: [Nagios-users] Query on check_http timeout option

2010-06-10 Thread Marc Powell

On Jun 10, 2010, at 3:55 AM, Sharad Ganapathy wrote:

 Yes. Sometimes the total time ( time to connect + download the content) 
 goes upto 1 minute.

It can go as long as you want as long as you also increase 
service_check_timeout in nagios.cfg.

--
Marc


--
ThinkGeek and WIRED's GeekDad team up for the Ultimate 
GeekDad Father's Day Giveaway. ONE MASSIVE PRIZE to the 
lucky parental unit.  See the prize list and enter to win: 
http://p.sf.net/sfu/thinkgeek-promo
___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null


Re: [Nagios-users] Query on check_http timeout option

2010-06-10 Thread Sharad Ganapathy
On 10 June 2010 18:16, Marc Powell li...@xodus.org wrote:


 On Jun 10, 2010, at 3:55 AM, Sharad Ganapathy wrote:

  Yes. Sometimes the total time ( time to connect + download the content)
  goes upto 1 minute.

 It can go as long as you want as long as you also increase
 service_check_timeout in nagios.cfg.


 Right . But the check times out in the host ( passive check). Nagios has
never complained of not receiving  info from this check ( UNKNOWN) state. My
concern is whether the timeout in check_http  applies to only the connection
part in establishing a TCP connection or the overall completion of the check
( time to connect + connect download .. ) .

Thanks

Sharad
--
ThinkGeek and WIRED's GeekDad team up for the Ultimate 
GeekDad Father's Day Giveaway. ONE MASSIVE PRIZE to the 
lucky parental unit.  See the prize list and enter to win: 
http://p.sf.net/sfu/thinkgeek-promo___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null

Re: [Nagios-users] a bit [OT] PNP4nagios help

2010-06-10 Thread Assaf Flatto
Thanks for the help , but managed to solve the issue .

The problem was that the pnp4 was looking in the wrong place for the rrd 
data provided by nagios .

Once i fixed the paths - the graph resumed their work.

Thanks for the nudge .

Assaf


Guy Waugh wrote:
 Hi Assaf,
  
 Have you restarted nagios lately?
  
 Have the permissions on the pnp4nagios files (or the directory they 
 reside in) changed?
  
 Has anything changed in nagios.cfg that might affect this?
  
 How are you running pnp4nagios? Synchronous mode, bulk mode or bulk 
 mode with NPCD? Are the nagios perfdata files being populated 
 correctly etc.?
  
 Cheers,
 Guy.

 On 9 June 2010 14:14, Assaf Flatto nag...@flatto.net 
 mailto:nag...@flatto.net wrote:

 Assaf Flatto wrote:
  Hello All
 
  Not sure this is the right place for this - but since many of us use
  pnp4nagios - i thought i might be able to get some advice , I've
  installed pnp4nagios and it worked well for more then a month ,
 but now
  it seem it no longer generating graphs for any of the existing
 checks.
 
  I can see in the nagios debug  file that the pnp script is
 executed ,
  and hence i was expecting the xml to be generated .
  When i execute the script manually - perl -d
  /usr/local/pnp4nagios/libexec/process_perfdata.pl
 http://process_perfdata.pl/ , i get the following
  perl error output , which is what i think stops my graphs from being
  created.
 
  Anyone ever encountered this issue ? or know whom/where i should
 post
  this query at ?
 
  Thanks
 
  Assaf
 
 
  Use of uninitialized value in concatenation (.) or string at
  /usr/local/pnp4nagios/libexec/process_perfdata.pl
 http://process_perfdata.pl/ line 1098.
   at /usr/local/pnp4nagios/libexec/process_perfdata.pl
 http://process_perfdata.pl/ line 1098
  main::handle_signal('ALRM') called at (eval
 
 
 10)[/usr/lib/perl5/vendor_perl/5.8.8/x86_64-linux-thread-multi/Term/ReadKey.pm:411]
  line 7
  eval {...} called at (eval
 
 
 10)[/usr/lib/perl5/vendor_perl/5.8.8/x86_64-linux-thread-multi/Term/ReadKey.pm:411]
  line 7
  Term::ReadKey::ReadKey(0, 'GLOB(0x7aea60)') called at
  /usr/lib/perl5/site_perl/5.8.8/Term/ReadLine/readline.pm
 http://readline.pm/ line 2086
  readline::rl_getc called at
  /usr/lib/perl5/site_perl/5.8.8/Term/ReadLine/readline.pm
 http://readline.pm/ line 2073
  readline::getc_with_pending() called at
  /usr/lib/perl5/site_perl/5.8.8/Term/ReadLine/readline.pm
 http://readline.pm/ line 1649
  readline::readline('  DB1 ') called at
  /usr/lib/perl5/site_perl/5.8.8/Term/ReadLine/Perl.pm line 11
 
 
 Term::ReadLine::Perl::readline('Term::ReadLine::Perl=ARRAY(0xc242c0)',
  '  DB1 ') called at /usr/lib/perl5/5.8.8/perl5db.pl
 http://perl5db.pl/ line 6371
  DB::readline('  DB1 ') called at
  /usr/lib/perl5/5.8.8/perl5db.pl http://perl5db.pl/ line 2203
  DB::DB called at /usr/lib/perl5/5.8.8/perl5db.pl
 http://perl5db.pl/ line 9425
  DB::fake::at_exit() called at
 /usr/lib/perl5/5.8.8/perl5db.pl http://perl5db.pl/
  line 8997
  DB::END() called at
  /usr/local/pnp4nagios/libexec/process_perfdata.pl
 http://process_perfdata.pl/ line 0
  eval {...} called at
  /usr/local/pnp4nagios/libexec/process_perfdata.pl
 http://process_perfdata.pl/ line 0
  Use of uninitialized value in concatenation (.) or string at
  /usr/local/pnp4nagios/libexec/process_perfdata.pl
 http://process_perfdata.pl/ line 1098.
   at /usr/local/pnp4nagios/libexec/process_perfdata.pl
 http://process_perfdata.pl/ line 1098
  main::handle_signal('ALRM') called at (eval
 
 
 10)[/usr/lib/perl5/vendor_perl/5.8.8/x86_64-linux-thread-multi/Term/ReadKey.pm:411]
  line 7
  eval {...} called at (eval
 
 
 10)[/usr/lib/perl5/vendor_perl/5.8.8/x86_64-linux-thread-multi/Term/ReadKey.pm:411]
  line 7
  Term::ReadKey::ReadKey(0, 'GLOB(0x7aea60)') called at
  /usr/lib/perl5/site_perl/5.8.8/Term/ReadLine/readline.pm
 http://readline.pm/ line 2086
  readline::rl_getc called at
  /usr/lib/perl5/site_perl/5.8.8/Term/ReadLine/readline.pm
 http://readline.pm/ line 2073
  readline::getc_with_pending() called at
  /usr/lib/perl5/site_perl/5.8.8/Term/ReadLine/readline.pm
 http://readline.pm/ line 1649
  readline::readline('  DB1 ') called at
  /usr/lib/perl5/site_perl/5.8.8/Term/ReadLine/Perl.pm line 11
 
 
 Term::ReadLine::Perl::readline('Term::ReadLine::Perl=ARRAY(0xc242c0)',
  '  DB1 ') called at /usr/lib/perl5/5.8.8/perl5db.pl
 http://perl5db.pl/ line 6371
  DB::readline('  DB1 ') called at
  

[Nagios-users] Large Installation

2010-06-10 Thread Scott Ward
We are looking to do an large installation of Nagios. Is it possible to
monitor over 800 machines and over 14000 services?

Has anyone tried doing anything like this? If you have how successful was it
and how did you configure it?

~Rultax
--
ThinkGeek and WIRED's GeekDad team up for the Ultimate 
GeekDad Father's Day Giveaway. ONE MASSIVE PRIZE to the 
lucky parental unit.  See the prize list and enter to win: 
http://p.sf.net/sfu/thinkgeek-promo___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null

Re: [Nagios-users] Large Installation

2010-06-10 Thread Mark Elsen
 We are looking to do an large installation of Nagios. Is it possible to
 monitor over 800 machines and over 14000 services?

 Works like a charm :-)


 Has anyone tried doing anything like this? If you have how successful was it
 and how did you configure it?


 Same as for a small installation of NAGIOS

M.

--
ThinkGeek and WIRED's GeekDad team up for the Ultimate 
GeekDad Father's Day Giveaway. ONE MASSIVE PRIZE to the 
lucky parental unit.  See the prize list and enter to win: 
http://p.sf.net/sfu/thinkgeek-promo
___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null


Re: [Nagios-users] Large Installation

2010-06-10 Thread Kevin Keane
Nagios does have some scalability issues, but for the most part you won't run 
into them until you get to truly huge installations.

I can see three main scalability issues: config file maintenance and the need 
for one central server, and firewall issues.

Config file maintenance can be improved to some extent with careful design of 
the config files, as well as tools. It is an issue that I am running into with 
a relatively small installation with 80+ hosts and 400+ services. My 
installation is highly heterogeneous and very dynamic, which makes config file 
maintenance a nightmare. Having to restart Nagios after a configuration change 
doesn't help either. On the other hand, a network with 2000 identical machines 
is probably going to be much easier to manage than my type of network.

The central server is an obvious bottleneck. No matter how powerful the machine 
and the network connection, there are only so many checks results it can 
handle. Fortunately, Nagios doesn't require much horsepower. Distributed 
monitoring helps with this issue because the most expensive part of Nagios is 
running active checks. With distributed monitoring, the active checks can run 
on multiple smaller boxes, and then send the check results back as passive 
checks.

Of course distributed monitoring compounds the config file maintenance issue, 
because you have to configure each check multiple times.

The third issue is not directly a scalability issue. Nagios is built with the 
assumption of a local and mostly trusted network. It's non-trivial to securely 
get checks to work on remote machines without pretty gaping poking holes into 
firewalls, and/or frequently establishing and tearing down encrypted 
connections with the attendant processing load. There are some third-party 
solutions for this issue, though.

From: Scott Ward [mailto:13.sward...@gmail.com]
Sent: Thursday, June 10, 2010 12:34 PM
To: Nagios Users List
Subject: Re: [Nagios-users] Large Installation

Make sure to read these pages:

http://nagios.sourceforge.net/docs/3_0/tuning.htmlhttp://nagios.sourceforge.net/docs/3_0/tuning.html
http://nagios.sourceforge.net/docs/3_0/largeinstalltweaks.htmlhttp://nagios.sourceforge.net/docs/3_0/largeinstalltweaks.html

Also, if you're monitoring 800 machines across WANs, you might look
into distributed monitoring:
http://nagios.sourceforge.net/docs/3_0/distributed.htmlhttp://nagios.sourceforge.net/docs/3_0/distributed.html

Let us know how it goes!

Thanks for the links.  So the distributive monitoring provided by the Nagios 
docs can handle what we're trying to do?  I have read in a few places that 
Nagios has scalability issues.


--Matt

BTW, what are you using for your config maintenance?

We haven't decided yet. Do you have any recommendations?


~S

On Thu, Jun 10, 2010 at 2:23 PM, Matt Simmons 
standalone.sysad...@gmail.commailto:standalone.sysad...@gmail.com wrote:
Make sure to read these pages:

http://nagios.sourceforge.net/docs/3_0/tuning.html
http://nagios.sourceforge.net/docs/3_0/largeinstalltweaks.html

Also, if you're monitoring 800 machines across WANs, you might look
into distributed monitoring:
http://nagios.sourceforge.net/docs/3_0/distributed.html

Let us know how it goes!

--Matt

BTW, what are you using for your config maintenance?


On Thu, Jun 10, 2010 at 1:51 PM, Scott Ward 
13.sward...@gmail.commailto:13.sward...@gmail.com wrote:
 We are looking to do an large installation of Nagios. Is it possible to
 monitor over 800 machines and over 14000 services?

 Has anyone tried doing anything like this? If you have how successful was it
 and how did you configure it?

 ~Rultax

 --
 ThinkGeek and WIRED's GeekDad team up for the Ultimate
 GeekDad Father's Day Giveaway. ONE MASSIVE PRIZE to the
 lucky parental unit.  See the prize list and enter to win:
 http://p.sf.net/sfu/thinkgeek-promo
 ___
 Nagios-users mailing list
 Nagios-users@lists.sourceforge.netmailto:Nagios-users@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/nagios-users
 ::: Please include Nagios version, plugin version (-v) and OS when reporting
 any issue.
 ::: Messages without supporting info will risk being sent to /dev/null



--
LITTLE GIRL: But which cookie will you eat FIRST?
COOKIE MONSTER: Me think you have misconception of cookie-eating process.

--
ThinkGeek and WIRED's GeekDad team up for the Ultimate
GeekDad Father's Day Giveaway. ONE MASSIVE PRIZE to the
lucky parental unit.  See the prize list and enter to win:
http://p.sf.net/sfu/thinkgeek-promo
___
Nagios-users mailing list
Nagios-users@lists.sourceforge.netmailto:Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS 

[Nagios-users] Strange fluctuation in load average

2010-06-10 Thread Trisha Hoang
Hi all,
When I first installed nagios-3.2.0 with embedded perl enabled, nagios
experienced increasing latency, starting at 1 sec and climbed upto 300
within a few hours until restarting nagios. I read on one of the older post
suggesting to recompile nagios *without* embedded perl, and that resolved
the latency issue, with latency consistently at less than 1 sec. However,
ever since, the system load average has fluctuated wildly from 1 to 12 and
down to say ... 3 within a minute. This fluctuation happens 3-10 minutes
each time and calms down for ... say an hour. There doesn't seem to be any
cron jobs that can cause this kind of load, and cpu (1-quad core) is usually
at least 50% idle , with plenty of free memory, no IO blocks, on Centos 5-2.
What's strange is with nagios compiled with embedded perl, the load was
consistently at 2-4.
Could this be nagios related? Please let me know if you need more
information.

-- 
Trisha Hoang | IT/Operations | Rockyou, Inc. | Phone: 408-472-3989 | AIM:
rockyoutrisha
--
ThinkGeek and WIRED's GeekDad team up for the Ultimate 
GeekDad Father's Day Giveaway. ONE MASSIVE PRIZE to the 
lucky parental unit.  See the prize list and enter to win: 
http://p.sf.net/sfu/thinkgeek-promo___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null

Re: [Nagios-users] Strange fluctuation in load average

2010-06-10 Thread Matt Simmons
When you say load average, do you mean the 1 minute moving average?
And what are you using to display the load average?

--Matt


On Thu, Jun 10, 2010 at 3:48 PM, Trisha Hoang tri...@rockyou.com wrote:
 Hi all,
 When I first installed nagios-3.2.0 with embedded perl enabled, nagios
 experienced increasing latency, starting at 1 sec and climbed upto 300
 within a few hours until restarting nagios. I read on one of the older post
 suggesting to recompile nagios *without* embedded perl, and that resolved
 the latency issue, with latency consistently at less than 1 sec. However,
 ever since, the system load average has fluctuated wildly from 1 to 12 and
 down to say ... 3 within a minute. This fluctuation happens 3-10 minutes
 each time and calms down for ... say an hour. There doesn't seem to be any
 cron jobs that can cause this kind of load, and cpu (1-quad core) is usually
 at least 50% idle , with plenty of free memory, no IO blocks, on Centos 5-2.
 What's strange is with nagios compiled with embedded perl, the load was
 consistently at 2-4.
 Could this be nagios related? Please let me know if you need more
 information.

 --
 Trisha Hoang | IT/Operations | Rockyou, Inc. | Phone: 408-472-3989 | AIM:
 rockyoutrisha

 --
 ThinkGeek and WIRED's GeekDad team up for the Ultimate
 GeekDad Father's Day Giveaway. ONE MASSIVE PRIZE to the
 lucky parental unit.  See the prize list and enter to win:
 http://p.sf.net/sfu/thinkgeek-promo
 ___
 Nagios-users mailing list
 Nagios-users@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/nagios-users
 ::: Please include Nagios version, plugin version (-v) and OS when reporting
 any issue.
 ::: Messages without supporting info will risk being sent to /dev/null




-- 
LITTLE GIRL: But which cookie will you eat FIRST?
COOKIE MONSTER: Me think you have misconception of cookie-eating process.

--
ThinkGeek and WIRED's GeekDad team up for the Ultimate 
GeekDad Father's Day Giveaway. ONE MASSIVE PRIZE to the 
lucky parental unit.  See the prize list and enter to win: 
http://p.sf.net/sfu/thinkgeek-promo
___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null

Re: [Nagios-users] extra checkresults files being left behind

2010-06-10 Thread Mathew Walker


Nagios v3.2.0

 

And I see the check and check.ok files:

-rw--- 1 nagios nagios291 Jun  9 07:12 checkzGuzY7
-rw--- 1 nagios nagios280 Jun  7 21:54 checkzjh6PZ
-rw--- 1 nagios nagios483 Jun 10 13:07 cxHWRxJ
-rw--- 1 nagios nagios  0 Jun 10 13:07 cxHWRxJ.ok


But the check* orphan files just keep showing up.  They don't relate to a 
specific host or check.  No real pattern to time, host, service, etc.  I could 
understand if the system was hitting 100% memory or CPU... but the memory is 
pretty stable in the 50-70% used range.  Load is nearly 0.00 across the board.  
The system is pretty much dedicated to my running nagios as a test box.


-- 
Mat W. - http://www.techadre.com


 
 Date: Wed, 9 Jun 2010 20:51:35 -0700
 From: mike-nag...@5dninja.net
 To: nagios-users@lists.sourceforge.net
 Subject: Re: [Nagios-users] extra checkresults files being left behind
 
 Mathew Walker wrote:
  I'm running Nagios on a little VPS box checking a few hosts/services 
  (~50 checks). It's mostly a testing platform for me and checks in on my 
  other test VPS systems.
  
  However I keep seeing the extra check results data files build up in 
  /usr/local/nagios/var/spool/checkresults like:
  -rw--- 1 nagios nagios 249 Jun 7 23:45 checknbu01O
  -rw--- 1 nagios nagios 252 Jun 8 02:40 checkHxcsiJ
 
  Googled a bit and didn't come up with much relevant. Any thoughts?
 
 If I remember correctly, the parent nagios process writes out that file, 
 then forks a child. The child then runs the check, updates that file 
 and then creates a file with the same name, plus '.ok' in that 
 directory, letting the parent process know the check is completed.
 
 So, take a look at the contents of several of those files, if you're 
 lucky, you'll see that either they are for the same host, or the same 
 service check. If so, there might be something in the way that host or 
 service is getting polled that is causing the forked child to die.
 
 Also, if you're running a version older than 3.0rc1 (generally always a 
 good thing to include the version of the tool you're useing, when asking 
 for help) then you may want to upgrade, that version fixed a bug that 
 might be related: Fixed bug with not deleting old check result files 
 that contained results for invalid host/service
 
 -- 
 Mike Lindsey
 
 --
 ThinkGeek and WIRED's GeekDad team up for the Ultimate 
 GeekDad Father's Day Giveaway. ONE MASSIVE PRIZE to the 
 lucky parental unit. See the prize list and enter to win: 
 http://p.sf.net/sfu/thinkgeek-promo
 ___
 Nagios-users mailing list
 Nagios-users@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/nagios-users
 ::: Please include Nagios version, plugin version (-v) and OS when reporting 
 any issue. 
 ::: Messages without supporting info will risk being sent to /dev/null
  
_
The New Busy is not the old busy. Search, chat and e-mail from your inbox.
http://www.windowslive.com/campaign/thenewbusy?ocid=PID28326::T:WLMTAGL:ON:WL:en-US:WM_HMP:042010_3--
ThinkGeek and WIRED's GeekDad team up for the Ultimate 
GeekDad Father's Day Giveaway. ONE MASSIVE PRIZE to the 
lucky parental unit.  See the prize list and enter to win: 
http://p.sf.net/sfu/thinkgeek-promo___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null

Re: [Nagios-users] Large Installation

2010-06-10 Thread Matt Simmons
Make sure to read these pages:

http://nagios.sourceforge.net/docs/3_0/tuning.html
http://nagios.sourceforge.net/docs/3_0/largeinstalltweaks.html

Also, if you're monitoring 800 machines across WANs, you might look
into distributed monitoring:
http://nagios.sourceforge.net/docs/3_0/distributed.html

Let us know how it goes!

--Matt

BTW, what are you using for your config maintenance?


On Thu, Jun 10, 2010 at 1:51 PM, Scott Ward 13.sward...@gmail.com wrote:
 We are looking to do an large installation of Nagios. Is it possible to
 monitor over 800 machines and over 14000 services?

 Has anyone tried doing anything like this? If you have how successful was it
 and how did you configure it?

 ~Rultax

 --
 ThinkGeek and WIRED's GeekDad team up for the Ultimate
 GeekDad Father's Day Giveaway. ONE MASSIVE PRIZE to the
 lucky parental unit.  See the prize list and enter to win:
 http://p.sf.net/sfu/thinkgeek-promo
 ___
 Nagios-users mailing list
 Nagios-users@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/nagios-users
 ::: Please include Nagios version, plugin version (-v) and OS when reporting
 any issue.
 ::: Messages without supporting info will risk being sent to /dev/null




-- 
LITTLE GIRL: But which cookie will you eat FIRST?
COOKIE MONSTER: Me think you have misconception of cookie-eating process.

--
ThinkGeek and WIRED's GeekDad team up for the Ultimate 
GeekDad Father's Day Giveaway. ONE MASSIVE PRIZE to the 
lucky parental unit.  See the prize list and enter to win: 
http://p.sf.net/sfu/thinkgeek-promo
___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null

Re: [Nagios-users] Large Installation

2010-06-10 Thread Matt Simmons
I can't say that I've solved the scalability problem, but I I don't
have it, just because I've implemented a policy such that I never
check any server over a WAN link, with the exception of another Nagios
server (plus both ends of all of the WAN links themselves).

This does require one Nagios server per site, but to me, that's an
appealing idea anyway, because I don't have a single point of failure.
Any of my Nagios installations could die completely, and I'd be
alerted by the others, just like any one internet connection could
die, and I'd still get alerts about it. In the event of a weird
failure, I can pretty much construct the network diagram based on
which links are reporting up, and from where.

It does require a certain amount of configuration overhead, but most
of that is done with templating anyway. I don't have my system laid
out exactly like I want, but I'm implementing version control
(subversion, in my case) and I have a different Nagios repository for
each site. If I had more templates (or more shared configuration
files), I would probably have a 'nagios-shared' repository, so I
wouldn't have to replicate everything manually.

As for the arrangement of my configs, it mostly follows this howto
that I did a year ago:
http://www.standalone-sysadmin.com/blog/2009/07/nagios-config/

Hope it can help someone

--Matt


On Thu, Jun 10, 2010 at 3:55 PM, Kevin Keane subscript...@kkeane.com wrote:
 Nagios does have some scalability issues, but for the most part you won’t
 run into them until you get to truly huge installations.



 I can see three main scalability issues: config file maintenance and the
 need for one central server, and firewall issues.



 Config file maintenance can be improved to some extent with careful design
 of the config files, as well as tools. It is an issue that I am running into
 with a relatively small installation with 80+ hosts and 400+ services. My
 installation is highly heterogeneous and very dynamic, which makes config
 file maintenance a nightmare. Having to restart Nagios after a configuration
 change doesn’t help either. On the other hand, a network with 2000 identical
 machines is probably going to be much easier to manage than my type of
 network.



 The central server is an obvious bottleneck. No matter how powerful the
 machine and the network connection, there are only so many checks results it
 can handle. Fortunately, Nagios doesn’t require much horsepower. Distributed
 monitoring helps with this issue because the most expensive part of Nagios
 is running active checks. With distributed monitoring, the active checks can
 run on multiple smaller boxes, and then send the check results back as
 passive checks.



 Of course distributed monitoring compounds the config file maintenance
 issue, because you have to configure each check multiple times.



 The third issue is not directly a scalability issue. Nagios is built with
 the assumption of a local and mostly trusted network. It’s non-trivial to
 securely get checks to work on remote machines without pretty gaping poking
 holes into firewalls, and/or frequently establishing and tearing down
 encrypted connections with the attendant processing load. There are some
 third-party solutions for this issue, though.



 From: Scott Ward [mailto:13.sward...@gmail.com]
 Sent: Thursday, June 10, 2010 12:34 PM
 To: Nagios Users List
 Subject: Re: [Nagios-users] Large Installation



Make sure to read these pages:

http://nagios.sourceforge.net/docs/3_0/tuning.html
http://nagios.sourceforge.net/docs/3_0/largeinstalltweaks.html

Also, if you're monitoring 800 machines across WANs, you might look
into distributed monitoring:
http://nagios.sourceforge.net/docs/3_0/distributed.html

Let us know how it goes!

 Thanks for the links.  So the distributive monitoring provided by the Nagios
 docs can handle what we're trying to do?  I have read in a few places that
 Nagios has scalability issues.


--Matt

BTW, what are you using for your config maintenance?

 We haven't decided yet. Do you have any recommendations?


 ~S

 On Thu, Jun 10, 2010 at 2:23 PM, Matt Simmons
 standalone.sysad...@gmail.com wrote:

 Make sure to read these pages:

 http://nagios.sourceforge.net/docs/3_0/tuning.html
 http://nagios.sourceforge.net/docs/3_0/largeinstalltweaks.html

 Also, if you're monitoring 800 machines across WANs, you might look
 into distributed monitoring:
 http://nagios.sourceforge.net/docs/3_0/distributed.html

 Let us know how it goes!

 --Matt

 BTW, what are you using for your config maintenance?

 On Thu, Jun 10, 2010 at 1:51 PM, Scott Ward 13.sward...@gmail.com wrote:

 We are looking to do an large installation of Nagios. Is it possible to
 monitor over 800 machines and over 14000 services?

 Has anyone tried doing anything like this? If you have how successful was
 it
 and how did you configure it?

 ~Rultax



 --
 ThinkGeek and WIRED's GeekDad 

Re: [Nagios-users] Strange fluctuation in load average

2010-06-10 Thread Trisha Hoang
I'm using uptime to obtain the load average.
Here's a snippet of the values.
09:17:34 up 5 days, 16:06, 3 users, load average: 2.07, 2.61, 3.45
09:19:34 up 5 days, 16:08, 3 users, load average: 9.09, 4.78, 4.13
09:21:34 up 5 days, 16:10, 3 users, load average: 10.05, 6.69, 4.91
09:23:34 up 5 days, 16:12, 3 users, load average: 8.83, 7.08, 5.24
09:25:34 up 5 days, 16:14, 3 users, load average: 9.42, 8.26, 5.91
09:27:34 up 5 days, 16:16, 3 users, load average: 4.43, 6.66, 5.60
09:29:34 up 5 days, 16:18, 3 users, load average: 13.06, 8.85, 6.51
09:31:34 up 5 days, 16:20, 3 users, load average: 7.35, 8.61, 6.73
09:33:34 up 5 days, 16:22, 3 users, load average: 7.87, 7.96, 6.69
09:35:34 up 5 days, 16:24, 3 users, load average: 4.25, 6.94, 6.49
09:37:34 up 5 days, 16:26, 3 users, load average: 2.50, 5.34, 5.95
09:39:34 up 5 days, 16:28, 3 users, load average: 7.53, 6.21, 6.19
09:41:34 up 5 days, 16:30, 3 users, load average: 5.71, 6.11, 6.15
09:43:34 up 5 days, 16:32, 3 users, load average: 1.56, 4.39, 5.51
--
ThinkGeek and WIRED's GeekDad team up for the Ultimate 
GeekDad Father's Day Giveaway. ONE MASSIVE PRIZE to the 
lucky parental unit.  See the prize list and enter to win: 
http://p.sf.net/sfu/thinkgeek-promo___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null

[Nagios-users] Nagios and IBM Tivoli SRM integration

2010-06-10 Thread Francisco Seratti
Hello to everyone,

  I have a simple question: Did anyone succeed integrating Nagios with this
Service Request Manager tool or any other IBM Tivoli software?

  My goal is to open new service requests in this framework automatically
for every Nagios DOWN notification.

  Thank you very much in advance, any suggestion will be appreciated.

Francisco.

ESTE MENSAJE ES CONFIDENCIAL. Puede contener información amparada por el 
secreto profesional. Si usted ha recibido este e-mail por error, por favor 
comuníquenoslo inmediatamente vía e-mail y tenga la amabilidad de eliminarlo de 
su sistema; no deberá copiar el mensaje ni divulgar su contenido a ninguna 
persona. Muchas gracias.

THIS MESSAGE IS CONFIDENTIAL. It may also contain information that is 
privileged or otherwise legally exempt from disclosure. If you have received it 
by mistake please let us know by e-mail immediately and delete it from your 
system; should also not copy the message nor disclose its contents to anyone. 
Many thanks.
--
ThinkGeek and WIRED's GeekDad team up for the Ultimate 
GeekDad Father's Day Giveaway. ONE MASSIVE PRIZE to the 
lucky parental unit.  See the prize list and enter to win: 
http://p.sf.net/sfu/thinkgeek-promo___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null

Re: [Nagios-users] check_yum issue

2010-06-10 Thread Terry
On Sat, Jun 5, 2010 at 9:02 AM, Kevin Keane subscript...@kkeane.com wrote:
 You would probably want to use sudo. Instead of having NRPE call check_yum 
 directly, have it call sudo check_yum, and add check_yum for the Nagios user 
 to your sudoers (make sure to not require a password, of course!)

 Be sure to keep the sudoers entry as restrictive as possible, or you may open 
 a security hole.

 -Original Message-
 From: Terry [mailto:td3...@gmail.com]
 Sent: Thursday, June 03, 2010 11:40 AM
 To: nagios-users@lists.sourceforge.net
 Subject: Re: [Nagios-users] check_yum issue

 On Thu, Jun 3, 2010 at 1:28 PM, Terry td3...@gmail.com wrote:
 Hello,

 I am trying to use check_yum:
 http://exchange.nagios.org/directory/Plugins/Uncategorized/Operating-S
 ystems/Linux/Check_Yum/details

 It works great from the command line:
 [r...@foo ~]# yum --security check-update Loaded plugins: dellsysid,
 rhnplugin, security Limiting package lists to security relevant ones
 Needed 4 of 11 packages, for security

 rhn-check.noarch
                         0.4.20-33.el5_5.2
                                             rhel-x86_64-server-5
 rhn-client-tools.noarch
                         0.4.20-33.el5_5.2
                                             rhel-x86_64-server-5
 rhn-setup.noarch
                         0.4.20-33.el5_5.2
                                             rhel-x86_64-server-5
 rhn-setup-gnome.noarch
                         0.4.20-33.el5_5.2
                                             rhel-x86_64-server-5
 [r...@foo ~]# /usr/lib64/nagios/plugins/check_yum
 YUM CRITICAL: 4 Security Updates Available. 7 Non-Security Updates
 Available [r...@foo ~]# echo $?
 2

 It returns this from nagios:
 [r...@foo ~]# /usr/lib64/nagios/plugins/check_nrpe -H 10.0.0.2 -t 50
 -c check_yum YUM OK: 0 Security Updates Available

 Here's my NRPE configuration:
 [r...@bar ~]# cat /etc/nagios/nrpe.cfg | grep check_yum
        command[check_yum]=/usr/lib64/nagios/plugins/check_yum

 What am I missing here?


 I think I fail here.  This is a permissions issue as noted in the
 description of the plugin.    Anyone doing something similar?  If so,
 how is your solution architected?

 Thanks!

 --
 ThinkGeek and WIRED's GeekDad team up for the Ultimate GeekDad Father's Day 
 Giveaway. ONE MASSIVE PRIZE to the lucky parental unit.  See the prize list 
 and enter to win:
 http://p.sf.net/sfu/thinkgeek-promo
 ___
 Nagios-users mailing list
 Nagios-users@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/nagios-users
 ::: Please include Nagios version, plugin version (-v) and OS when reporting 
 any issue.
 ::: Messages without supporting info will risk being sent to /dev/null

 --
 ThinkGeek and WIRED's GeekDad team up for the Ultimate
 GeekDad Father's Day Giveaway. ONE MASSIVE PRIZE to the
 lucky parental unit.  See the prize list and enter to win:
 http://p.sf.net/sfu/thinkgeek-promo
 ___
 Nagios-users mailing list
 Nagios-users@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/nagios-users
 ::: Please include Nagios version, plugin version (-v) and OS when reporting 
 any issue.
 ::: Messages without supporting info will risk being sent to /dev/null


I think I did one better, maybe.  I am having nagios call check_by_ssh
which uses a key that is specific for this command.  On the remote
side, I am configuring the authorized_hosts such as this:
command=/usr/lib/nagios/plugins/check_yum ssh-rsa AA.

The only thing this key can do is call check_yum on the remote end.

--
ThinkGeek and WIRED's GeekDad team up for the Ultimate 
GeekDad Father's Day Giveaway. ONE MASSIVE PRIZE to the 
lucky parental unit.  See the prize list and enter to win: 
http://p.sf.net/sfu/thinkgeek-promo
___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null