Re: [Nagios-users-br] Monitorar tomcat/JVM
tomcat 18085 1 0 Jun09 ?00:11:26 /usr/java/default/bin/java -server -(...)* -Dcom.sun.management.jmxremote -Dcom.sun.management.jmxremote.port=8998 -Dcom.sun.management.jmxremote.ssl=false -Dcom.sun.management.jmxremote.authenticate=false* trecho de um tomcat rodando com jmx abilitado jmx não é exclusivo do container, é uma característica da JVM, então todos os servlet containers por aí possuem a capacidade de instrumentação jmx, até o jetty da microsoft. []s 2010/6/9 Jose Oliveira jotag...@gmail.com Marcel JMX é só pro JBOSS né não? serve pro tomcat tambem? Em 9 de junho de 2010 15:30, Marcel mits...@gmail.com escreveu: Yo, Pra mim o melhor jeito de monitorar esses recursos é via JMX. Tem um plugin no nagiosexchange que faz tudo isso! []s 2010/6/9 Renato T Melo tamie...@gmail.com Ola pessoal, preciso monitorar o tomcat nos seguintes aspectos: 1- Utilizacao (%) de memória 2- Utilizacao (%) do JVM (Java Virtual Machine) 3- Conexoes tomcat 4- Timeout conexões 5- Threads apache/tomcat 6- Numero de conexoes com o banco em uso Acontece que não tenho muita intimidade com toda essa parafernalia (tomcat/jvm/jboss) ! Alguem da lista jah passou por essa experiencia e pode nos ajudar? Obrigado. Renato. -- ThinkGeek and WIRED's GeekDad team up for the Ultimate GeekDad Father's Day Giveaway. ONE MASSIVE PRIZE to the lucky parental unit. See the prize list and enter to win: http://p.sf.net/sfu/thinkgeek-promo -- Nagios-users-br@lists.sourceforge.net mailing list https://lists.sourceforge.net/lists/listinfo/nagios-users-br Wiki: http://nagios-br.sf.net/wiki -- ThinkGeek and WIRED's GeekDad team up for the Ultimate GeekDad Father's Day Giveaway. ONE MASSIVE PRIZE to the lucky parental unit. See the prize list and enter to win: http://p.sf.net/sfu/thinkgeek-promo -- Nagios-users-br@lists.sourceforge.net mailing list https://lists.sourceforge.net/lists/listinfo/nagios-users-br Wiki: http://nagios-br.sf.net/wiki -- Abraços JGeraldo -- ThinkGeek and WIRED's GeekDad team up for the Ultimate GeekDad Father's Day Giveaway. ONE MASSIVE PRIZE to the lucky parental unit. See the prize list and enter to win: http://p.sf.net/sfu/thinkgeek-promo -- Nagios-users-br@lists.sourceforge.net mailing list https://lists.sourceforge.net/lists/listinfo/nagios-users-br Wiki: http://nagios-br.sf.net/wiki -- ThinkGeek and WIRED's GeekDad team up for the Ultimate GeekDad Father's Day Giveaway. ONE MASSIVE PRIZE to the lucky parental unit. See the prize list and enter to win: http://p.sf.net/sfu/thinkgeek-promo -- Nagios-users-br@lists.sourceforge.net mailing list https://lists.sourceforge.net/lists/listinfo/nagios-users-br Wiki: http://nagios-br.sf.net/wiki
[Nagios-users] Query on check_http timeout option
Hello, I want to understand the timeout option in check_http. From the help option, it states : -t, --timeout=INTEGER Seconds before connection times out (default: 10) I monitor a webservice to check for connectivity and also pass some parameters to get some content back. Usually the download transfer takes around 20 seconds and occasionally it takes well over a minute or two. I have configured my check in this fashion. $ROOT/libexec/nagios/check_http -u 'some URL' -I API Hostname -t 10 -c 20 -p 4080 -A 'Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.0; pagechecks mon)' I am seeing the check error out with the message 'Socket timeout after 10 seconds' quite often. When I manually connect ( telnet/curl) the URL the connection time is well below 10 seconds. Is the timeout parameter used to check the time it takes to establish a TCP connection or to govern the time the check took to complete ? Thanks Sharad -- ThinkGeek and WIRED's GeekDad team up for the Ultimate GeekDad Father's Day Giveaway. ONE MASSIVE PRIZE to the lucky parental unit. See the prize list and enter to win: http://p.sf.net/sfu/thinkgeek-promo___ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null
Re: [Nagios-users] Query on check_http timeout option
I believe it is the time the check_http took to connect, and download the page. Is the total time when you manually connect using curl greater than 10 seconds? Daniel H Lockard On Thu, Jun 10, 2010 at 1:07 AM, Sharad Ganapathy sharadg...@gmail.com wrote: Hello, I want to understand the timeout option in check_http. From the help option, it states : -t, --timeout=INTEGER Seconds before connection times out (default: 10) I monitor a webservice to check for connectivity and also pass some parameters to get some content back. Usually the download transfer takes around 20 seconds and occasionally it takes well over a minute or two. I have configured my check in this fashion. $ROOT/libexec/nagios/check_http -u 'some URL' -I API Hostname -t 10 -c 20 -p 4080 -A 'Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.0; pagechecks mon)' I am seeing the check error out with the message 'Socket timeout after 10 seconds' quite often. When I manually connect ( telnet/curl) the URL the connection time is well below 10 seconds. Is the timeout parameter used to check the time it takes to establish a TCP connection or to govern the time the check took to complete ? Thanks Sharad -- ThinkGeek and WIRED's GeekDad team up for the Ultimate GeekDad Father's Day Giveaway. ONE MASSIVE PRIZE to the lucky parental unit. See the prize list and enter to win: http://p.sf.net/sfu/thinkgeek-promo ___ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null -- ThinkGeek and WIRED's GeekDad team up for the Ultimate GeekDad Father's Day Giveaway. ONE MASSIVE PRIZE to the lucky parental unit. See the prize list and enter to win: http://p.sf.net/sfu/thinkgeek-promo ___ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null
Re: [Nagios-users] Query on check_http timeout option
Daniel Lockard wrote: I believe it is the time the check_http took to connect, and download the page. Is the total time when you manually connect using curl greater than 10 seconds? Daniel H Lockard Yes. Sometimes the total time ( time to connect + download the content) goes upto 1 minute. Sharad -- ThinkGeek and WIRED's GeekDad team up for the Ultimate GeekDad Father's Day Giveaway. ONE MASSIVE PRIZE to the lucky parental unit. See the prize list and enter to win: http://p.sf.net/sfu/thinkgeek-promo ___ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null
Re: [Nagios-users] Query on check_http timeout option
On Jun 10, 2010, at 3:55 AM, Sharad Ganapathy wrote: Yes. Sometimes the total time ( time to connect + download the content) goes upto 1 minute. It can go as long as you want as long as you also increase service_check_timeout in nagios.cfg. -- Marc -- ThinkGeek and WIRED's GeekDad team up for the Ultimate GeekDad Father's Day Giveaway. ONE MASSIVE PRIZE to the lucky parental unit. See the prize list and enter to win: http://p.sf.net/sfu/thinkgeek-promo ___ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null
Re: [Nagios-users] Query on check_http timeout option
On 10 June 2010 18:16, Marc Powell li...@xodus.org wrote: On Jun 10, 2010, at 3:55 AM, Sharad Ganapathy wrote: Yes. Sometimes the total time ( time to connect + download the content) goes upto 1 minute. It can go as long as you want as long as you also increase service_check_timeout in nagios.cfg. Right . But the check times out in the host ( passive check). Nagios has never complained of not receiving info from this check ( UNKNOWN) state. My concern is whether the timeout in check_http applies to only the connection part in establishing a TCP connection or the overall completion of the check ( time to connect + connect download .. ) . Thanks Sharad -- ThinkGeek and WIRED's GeekDad team up for the Ultimate GeekDad Father's Day Giveaway. ONE MASSIVE PRIZE to the lucky parental unit. See the prize list and enter to win: http://p.sf.net/sfu/thinkgeek-promo___ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null
Re: [Nagios-users] a bit [OT] PNP4nagios help
Thanks for the help , but managed to solve the issue . The problem was that the pnp4 was looking in the wrong place for the rrd data provided by nagios . Once i fixed the paths - the graph resumed their work. Thanks for the nudge . Assaf Guy Waugh wrote: Hi Assaf, Have you restarted nagios lately? Have the permissions on the pnp4nagios files (or the directory they reside in) changed? Has anything changed in nagios.cfg that might affect this? How are you running pnp4nagios? Synchronous mode, bulk mode or bulk mode with NPCD? Are the nagios perfdata files being populated correctly etc.? Cheers, Guy. On 9 June 2010 14:14, Assaf Flatto nag...@flatto.net mailto:nag...@flatto.net wrote: Assaf Flatto wrote: Hello All Not sure this is the right place for this - but since many of us use pnp4nagios - i thought i might be able to get some advice , I've installed pnp4nagios and it worked well for more then a month , but now it seem it no longer generating graphs for any of the existing checks. I can see in the nagios debug file that the pnp script is executed , and hence i was expecting the xml to be generated . When i execute the script manually - perl -d /usr/local/pnp4nagios/libexec/process_perfdata.pl http://process_perfdata.pl/ , i get the following perl error output , which is what i think stops my graphs from being created. Anyone ever encountered this issue ? or know whom/where i should post this query at ? Thanks Assaf Use of uninitialized value in concatenation (.) or string at /usr/local/pnp4nagios/libexec/process_perfdata.pl http://process_perfdata.pl/ line 1098. at /usr/local/pnp4nagios/libexec/process_perfdata.pl http://process_perfdata.pl/ line 1098 main::handle_signal('ALRM') called at (eval 10)[/usr/lib/perl5/vendor_perl/5.8.8/x86_64-linux-thread-multi/Term/ReadKey.pm:411] line 7 eval {...} called at (eval 10)[/usr/lib/perl5/vendor_perl/5.8.8/x86_64-linux-thread-multi/Term/ReadKey.pm:411] line 7 Term::ReadKey::ReadKey(0, 'GLOB(0x7aea60)') called at /usr/lib/perl5/site_perl/5.8.8/Term/ReadLine/readline.pm http://readline.pm/ line 2086 readline::rl_getc called at /usr/lib/perl5/site_perl/5.8.8/Term/ReadLine/readline.pm http://readline.pm/ line 2073 readline::getc_with_pending() called at /usr/lib/perl5/site_perl/5.8.8/Term/ReadLine/readline.pm http://readline.pm/ line 1649 readline::readline(' DB1 ') called at /usr/lib/perl5/site_perl/5.8.8/Term/ReadLine/Perl.pm line 11 Term::ReadLine::Perl::readline('Term::ReadLine::Perl=ARRAY(0xc242c0)', ' DB1 ') called at /usr/lib/perl5/5.8.8/perl5db.pl http://perl5db.pl/ line 6371 DB::readline(' DB1 ') called at /usr/lib/perl5/5.8.8/perl5db.pl http://perl5db.pl/ line 2203 DB::DB called at /usr/lib/perl5/5.8.8/perl5db.pl http://perl5db.pl/ line 9425 DB::fake::at_exit() called at /usr/lib/perl5/5.8.8/perl5db.pl http://perl5db.pl/ line 8997 DB::END() called at /usr/local/pnp4nagios/libexec/process_perfdata.pl http://process_perfdata.pl/ line 0 eval {...} called at /usr/local/pnp4nagios/libexec/process_perfdata.pl http://process_perfdata.pl/ line 0 Use of uninitialized value in concatenation (.) or string at /usr/local/pnp4nagios/libexec/process_perfdata.pl http://process_perfdata.pl/ line 1098. at /usr/local/pnp4nagios/libexec/process_perfdata.pl http://process_perfdata.pl/ line 1098 main::handle_signal('ALRM') called at (eval 10)[/usr/lib/perl5/vendor_perl/5.8.8/x86_64-linux-thread-multi/Term/ReadKey.pm:411] line 7 eval {...} called at (eval 10)[/usr/lib/perl5/vendor_perl/5.8.8/x86_64-linux-thread-multi/Term/ReadKey.pm:411] line 7 Term::ReadKey::ReadKey(0, 'GLOB(0x7aea60)') called at /usr/lib/perl5/site_perl/5.8.8/Term/ReadLine/readline.pm http://readline.pm/ line 2086 readline::rl_getc called at /usr/lib/perl5/site_perl/5.8.8/Term/ReadLine/readline.pm http://readline.pm/ line 2073 readline::getc_with_pending() called at /usr/lib/perl5/site_perl/5.8.8/Term/ReadLine/readline.pm http://readline.pm/ line 1649 readline::readline(' DB1 ') called at /usr/lib/perl5/site_perl/5.8.8/Term/ReadLine/Perl.pm line 11 Term::ReadLine::Perl::readline('Term::ReadLine::Perl=ARRAY(0xc242c0)', ' DB1 ') called at /usr/lib/perl5/5.8.8/perl5db.pl http://perl5db.pl/ line 6371 DB::readline(' DB1 ') called at
[Nagios-users] Large Installation
We are looking to do an large installation of Nagios. Is it possible to monitor over 800 machines and over 14000 services? Has anyone tried doing anything like this? If you have how successful was it and how did you configure it? ~Rultax -- ThinkGeek and WIRED's GeekDad team up for the Ultimate GeekDad Father's Day Giveaway. ONE MASSIVE PRIZE to the lucky parental unit. See the prize list and enter to win: http://p.sf.net/sfu/thinkgeek-promo___ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null
Re: [Nagios-users] Large Installation
We are looking to do an large installation of Nagios. Is it possible to monitor over 800 machines and over 14000 services? Works like a charm :-) Has anyone tried doing anything like this? If you have how successful was it and how did you configure it? Same as for a small installation of NAGIOS M. -- ThinkGeek and WIRED's GeekDad team up for the Ultimate GeekDad Father's Day Giveaway. ONE MASSIVE PRIZE to the lucky parental unit. See the prize list and enter to win: http://p.sf.net/sfu/thinkgeek-promo ___ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null
Re: [Nagios-users] Large Installation
Nagios does have some scalability issues, but for the most part you won't run into them until you get to truly huge installations. I can see three main scalability issues: config file maintenance and the need for one central server, and firewall issues. Config file maintenance can be improved to some extent with careful design of the config files, as well as tools. It is an issue that I am running into with a relatively small installation with 80+ hosts and 400+ services. My installation is highly heterogeneous and very dynamic, which makes config file maintenance a nightmare. Having to restart Nagios after a configuration change doesn't help either. On the other hand, a network with 2000 identical machines is probably going to be much easier to manage than my type of network. The central server is an obvious bottleneck. No matter how powerful the machine and the network connection, there are only so many checks results it can handle. Fortunately, Nagios doesn't require much horsepower. Distributed monitoring helps with this issue because the most expensive part of Nagios is running active checks. With distributed monitoring, the active checks can run on multiple smaller boxes, and then send the check results back as passive checks. Of course distributed monitoring compounds the config file maintenance issue, because you have to configure each check multiple times. The third issue is not directly a scalability issue. Nagios is built with the assumption of a local and mostly trusted network. It's non-trivial to securely get checks to work on remote machines without pretty gaping poking holes into firewalls, and/or frequently establishing and tearing down encrypted connections with the attendant processing load. There are some third-party solutions for this issue, though. From: Scott Ward [mailto:13.sward...@gmail.com] Sent: Thursday, June 10, 2010 12:34 PM To: Nagios Users List Subject: Re: [Nagios-users] Large Installation Make sure to read these pages: http://nagios.sourceforge.net/docs/3_0/tuning.htmlhttp://nagios.sourceforge.net/docs/3_0/tuning.html http://nagios.sourceforge.net/docs/3_0/largeinstalltweaks.htmlhttp://nagios.sourceforge.net/docs/3_0/largeinstalltweaks.html Also, if you're monitoring 800 machines across WANs, you might look into distributed monitoring: http://nagios.sourceforge.net/docs/3_0/distributed.htmlhttp://nagios.sourceforge.net/docs/3_0/distributed.html Let us know how it goes! Thanks for the links. So the distributive monitoring provided by the Nagios docs can handle what we're trying to do? I have read in a few places that Nagios has scalability issues. --Matt BTW, what are you using for your config maintenance? We haven't decided yet. Do you have any recommendations? ~S On Thu, Jun 10, 2010 at 2:23 PM, Matt Simmons standalone.sysad...@gmail.commailto:standalone.sysad...@gmail.com wrote: Make sure to read these pages: http://nagios.sourceforge.net/docs/3_0/tuning.html http://nagios.sourceforge.net/docs/3_0/largeinstalltweaks.html Also, if you're monitoring 800 machines across WANs, you might look into distributed monitoring: http://nagios.sourceforge.net/docs/3_0/distributed.html Let us know how it goes! --Matt BTW, what are you using for your config maintenance? On Thu, Jun 10, 2010 at 1:51 PM, Scott Ward 13.sward...@gmail.commailto:13.sward...@gmail.com wrote: We are looking to do an large installation of Nagios. Is it possible to monitor over 800 machines and over 14000 services? Has anyone tried doing anything like this? If you have how successful was it and how did you configure it? ~Rultax -- ThinkGeek and WIRED's GeekDad team up for the Ultimate GeekDad Father's Day Giveaway. ONE MASSIVE PRIZE to the lucky parental unit. See the prize list and enter to win: http://p.sf.net/sfu/thinkgeek-promo ___ Nagios-users mailing list Nagios-users@lists.sourceforge.netmailto:Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null -- LITTLE GIRL: But which cookie will you eat FIRST? COOKIE MONSTER: Me think you have misconception of cookie-eating process. -- ThinkGeek and WIRED's GeekDad team up for the Ultimate GeekDad Father's Day Giveaway. ONE MASSIVE PRIZE to the lucky parental unit. See the prize list and enter to win: http://p.sf.net/sfu/thinkgeek-promo ___ Nagios-users mailing list Nagios-users@lists.sourceforge.netmailto:Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS
[Nagios-users] Strange fluctuation in load average
Hi all, When I first installed nagios-3.2.0 with embedded perl enabled, nagios experienced increasing latency, starting at 1 sec and climbed upto 300 within a few hours until restarting nagios. I read on one of the older post suggesting to recompile nagios *without* embedded perl, and that resolved the latency issue, with latency consistently at less than 1 sec. However, ever since, the system load average has fluctuated wildly from 1 to 12 and down to say ... 3 within a minute. This fluctuation happens 3-10 minutes each time and calms down for ... say an hour. There doesn't seem to be any cron jobs that can cause this kind of load, and cpu (1-quad core) is usually at least 50% idle , with plenty of free memory, no IO blocks, on Centos 5-2. What's strange is with nagios compiled with embedded perl, the load was consistently at 2-4. Could this be nagios related? Please let me know if you need more information. -- Trisha Hoang | IT/Operations | Rockyou, Inc. | Phone: 408-472-3989 | AIM: rockyoutrisha -- ThinkGeek and WIRED's GeekDad team up for the Ultimate GeekDad Father's Day Giveaway. ONE MASSIVE PRIZE to the lucky parental unit. See the prize list and enter to win: http://p.sf.net/sfu/thinkgeek-promo___ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null
Re: [Nagios-users] Strange fluctuation in load average
When you say load average, do you mean the 1 minute moving average? And what are you using to display the load average? --Matt On Thu, Jun 10, 2010 at 3:48 PM, Trisha Hoang tri...@rockyou.com wrote: Hi all, When I first installed nagios-3.2.0 with embedded perl enabled, nagios experienced increasing latency, starting at 1 sec and climbed upto 300 within a few hours until restarting nagios. I read on one of the older post suggesting to recompile nagios *without* embedded perl, and that resolved the latency issue, with latency consistently at less than 1 sec. However, ever since, the system load average has fluctuated wildly from 1 to 12 and down to say ... 3 within a minute. This fluctuation happens 3-10 minutes each time and calms down for ... say an hour. There doesn't seem to be any cron jobs that can cause this kind of load, and cpu (1-quad core) is usually at least 50% idle , with plenty of free memory, no IO blocks, on Centos 5-2. What's strange is with nagios compiled with embedded perl, the load was consistently at 2-4. Could this be nagios related? Please let me know if you need more information. -- Trisha Hoang | IT/Operations | Rockyou, Inc. | Phone: 408-472-3989 | AIM: rockyoutrisha -- ThinkGeek and WIRED's GeekDad team up for the Ultimate GeekDad Father's Day Giveaway. ONE MASSIVE PRIZE to the lucky parental unit. See the prize list and enter to win: http://p.sf.net/sfu/thinkgeek-promo ___ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null -- LITTLE GIRL: But which cookie will you eat FIRST? COOKIE MONSTER: Me think you have misconception of cookie-eating process. -- ThinkGeek and WIRED's GeekDad team up for the Ultimate GeekDad Father's Day Giveaway. ONE MASSIVE PRIZE to the lucky parental unit. See the prize list and enter to win: http://p.sf.net/sfu/thinkgeek-promo ___ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null
Re: [Nagios-users] extra checkresults files being left behind
Nagios v3.2.0 And I see the check and check.ok files: -rw--- 1 nagios nagios291 Jun 9 07:12 checkzGuzY7 -rw--- 1 nagios nagios280 Jun 7 21:54 checkzjh6PZ -rw--- 1 nagios nagios483 Jun 10 13:07 cxHWRxJ -rw--- 1 nagios nagios 0 Jun 10 13:07 cxHWRxJ.ok But the check* orphan files just keep showing up. They don't relate to a specific host or check. No real pattern to time, host, service, etc. I could understand if the system was hitting 100% memory or CPU... but the memory is pretty stable in the 50-70% used range. Load is nearly 0.00 across the board. The system is pretty much dedicated to my running nagios as a test box. -- Mat W. - http://www.techadre.com Date: Wed, 9 Jun 2010 20:51:35 -0700 From: mike-nag...@5dninja.net To: nagios-users@lists.sourceforge.net Subject: Re: [Nagios-users] extra checkresults files being left behind Mathew Walker wrote: I'm running Nagios on a little VPS box checking a few hosts/services (~50 checks). It's mostly a testing platform for me and checks in on my other test VPS systems. However I keep seeing the extra check results data files build up in /usr/local/nagios/var/spool/checkresults like: -rw--- 1 nagios nagios 249 Jun 7 23:45 checknbu01O -rw--- 1 nagios nagios 252 Jun 8 02:40 checkHxcsiJ Googled a bit and didn't come up with much relevant. Any thoughts? If I remember correctly, the parent nagios process writes out that file, then forks a child. The child then runs the check, updates that file and then creates a file with the same name, plus '.ok' in that directory, letting the parent process know the check is completed. So, take a look at the contents of several of those files, if you're lucky, you'll see that either they are for the same host, or the same service check. If so, there might be something in the way that host or service is getting polled that is causing the forked child to die. Also, if you're running a version older than 3.0rc1 (generally always a good thing to include the version of the tool you're useing, when asking for help) then you may want to upgrade, that version fixed a bug that might be related: Fixed bug with not deleting old check result files that contained results for invalid host/service -- Mike Lindsey -- ThinkGeek and WIRED's GeekDad team up for the Ultimate GeekDad Father's Day Giveaway. ONE MASSIVE PRIZE to the lucky parental unit. See the prize list and enter to win: http://p.sf.net/sfu/thinkgeek-promo ___ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null _ The New Busy is not the old busy. Search, chat and e-mail from your inbox. http://www.windowslive.com/campaign/thenewbusy?ocid=PID28326::T:WLMTAGL:ON:WL:en-US:WM_HMP:042010_3-- ThinkGeek and WIRED's GeekDad team up for the Ultimate GeekDad Father's Day Giveaway. ONE MASSIVE PRIZE to the lucky parental unit. See the prize list and enter to win: http://p.sf.net/sfu/thinkgeek-promo___ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null
Re: [Nagios-users] Large Installation
Make sure to read these pages: http://nagios.sourceforge.net/docs/3_0/tuning.html http://nagios.sourceforge.net/docs/3_0/largeinstalltweaks.html Also, if you're monitoring 800 machines across WANs, you might look into distributed monitoring: http://nagios.sourceforge.net/docs/3_0/distributed.html Let us know how it goes! --Matt BTW, what are you using for your config maintenance? On Thu, Jun 10, 2010 at 1:51 PM, Scott Ward 13.sward...@gmail.com wrote: We are looking to do an large installation of Nagios. Is it possible to monitor over 800 machines and over 14000 services? Has anyone tried doing anything like this? If you have how successful was it and how did you configure it? ~Rultax -- ThinkGeek and WIRED's GeekDad team up for the Ultimate GeekDad Father's Day Giveaway. ONE MASSIVE PRIZE to the lucky parental unit. See the prize list and enter to win: http://p.sf.net/sfu/thinkgeek-promo ___ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null -- LITTLE GIRL: But which cookie will you eat FIRST? COOKIE MONSTER: Me think you have misconception of cookie-eating process. -- ThinkGeek and WIRED's GeekDad team up for the Ultimate GeekDad Father's Day Giveaway. ONE MASSIVE PRIZE to the lucky parental unit. See the prize list and enter to win: http://p.sf.net/sfu/thinkgeek-promo ___ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null
Re: [Nagios-users] Large Installation
I can't say that I've solved the scalability problem, but I I don't have it, just because I've implemented a policy such that I never check any server over a WAN link, with the exception of another Nagios server (plus both ends of all of the WAN links themselves). This does require one Nagios server per site, but to me, that's an appealing idea anyway, because I don't have a single point of failure. Any of my Nagios installations could die completely, and I'd be alerted by the others, just like any one internet connection could die, and I'd still get alerts about it. In the event of a weird failure, I can pretty much construct the network diagram based on which links are reporting up, and from where. It does require a certain amount of configuration overhead, but most of that is done with templating anyway. I don't have my system laid out exactly like I want, but I'm implementing version control (subversion, in my case) and I have a different Nagios repository for each site. If I had more templates (or more shared configuration files), I would probably have a 'nagios-shared' repository, so I wouldn't have to replicate everything manually. As for the arrangement of my configs, it mostly follows this howto that I did a year ago: http://www.standalone-sysadmin.com/blog/2009/07/nagios-config/ Hope it can help someone --Matt On Thu, Jun 10, 2010 at 3:55 PM, Kevin Keane subscript...@kkeane.com wrote: Nagios does have some scalability issues, but for the most part you won’t run into them until you get to truly huge installations. I can see three main scalability issues: config file maintenance and the need for one central server, and firewall issues. Config file maintenance can be improved to some extent with careful design of the config files, as well as tools. It is an issue that I am running into with a relatively small installation with 80+ hosts and 400+ services. My installation is highly heterogeneous and very dynamic, which makes config file maintenance a nightmare. Having to restart Nagios after a configuration change doesn’t help either. On the other hand, a network with 2000 identical machines is probably going to be much easier to manage than my type of network. The central server is an obvious bottleneck. No matter how powerful the machine and the network connection, there are only so many checks results it can handle. Fortunately, Nagios doesn’t require much horsepower. Distributed monitoring helps with this issue because the most expensive part of Nagios is running active checks. With distributed monitoring, the active checks can run on multiple smaller boxes, and then send the check results back as passive checks. Of course distributed monitoring compounds the config file maintenance issue, because you have to configure each check multiple times. The third issue is not directly a scalability issue. Nagios is built with the assumption of a local and mostly trusted network. It’s non-trivial to securely get checks to work on remote machines without pretty gaping poking holes into firewalls, and/or frequently establishing and tearing down encrypted connections with the attendant processing load. There are some third-party solutions for this issue, though. From: Scott Ward [mailto:13.sward...@gmail.com] Sent: Thursday, June 10, 2010 12:34 PM To: Nagios Users List Subject: Re: [Nagios-users] Large Installation Make sure to read these pages: http://nagios.sourceforge.net/docs/3_0/tuning.html http://nagios.sourceforge.net/docs/3_0/largeinstalltweaks.html Also, if you're monitoring 800 machines across WANs, you might look into distributed monitoring: http://nagios.sourceforge.net/docs/3_0/distributed.html Let us know how it goes! Thanks for the links. So the distributive monitoring provided by the Nagios docs can handle what we're trying to do? I have read in a few places that Nagios has scalability issues. --Matt BTW, what are you using for your config maintenance? We haven't decided yet. Do you have any recommendations? ~S On Thu, Jun 10, 2010 at 2:23 PM, Matt Simmons standalone.sysad...@gmail.com wrote: Make sure to read these pages: http://nagios.sourceforge.net/docs/3_0/tuning.html http://nagios.sourceforge.net/docs/3_0/largeinstalltweaks.html Also, if you're monitoring 800 machines across WANs, you might look into distributed monitoring: http://nagios.sourceforge.net/docs/3_0/distributed.html Let us know how it goes! --Matt BTW, what are you using for your config maintenance? On Thu, Jun 10, 2010 at 1:51 PM, Scott Ward 13.sward...@gmail.com wrote: We are looking to do an large installation of Nagios. Is it possible to monitor over 800 machines and over 14000 services? Has anyone tried doing anything like this? If you have how successful was it and how did you configure it? ~Rultax -- ThinkGeek and WIRED's GeekDad
Re: [Nagios-users] Strange fluctuation in load average
I'm using uptime to obtain the load average. Here's a snippet of the values. 09:17:34 up 5 days, 16:06, 3 users, load average: 2.07, 2.61, 3.45 09:19:34 up 5 days, 16:08, 3 users, load average: 9.09, 4.78, 4.13 09:21:34 up 5 days, 16:10, 3 users, load average: 10.05, 6.69, 4.91 09:23:34 up 5 days, 16:12, 3 users, load average: 8.83, 7.08, 5.24 09:25:34 up 5 days, 16:14, 3 users, load average: 9.42, 8.26, 5.91 09:27:34 up 5 days, 16:16, 3 users, load average: 4.43, 6.66, 5.60 09:29:34 up 5 days, 16:18, 3 users, load average: 13.06, 8.85, 6.51 09:31:34 up 5 days, 16:20, 3 users, load average: 7.35, 8.61, 6.73 09:33:34 up 5 days, 16:22, 3 users, load average: 7.87, 7.96, 6.69 09:35:34 up 5 days, 16:24, 3 users, load average: 4.25, 6.94, 6.49 09:37:34 up 5 days, 16:26, 3 users, load average: 2.50, 5.34, 5.95 09:39:34 up 5 days, 16:28, 3 users, load average: 7.53, 6.21, 6.19 09:41:34 up 5 days, 16:30, 3 users, load average: 5.71, 6.11, 6.15 09:43:34 up 5 days, 16:32, 3 users, load average: 1.56, 4.39, 5.51 -- ThinkGeek and WIRED's GeekDad team up for the Ultimate GeekDad Father's Day Giveaway. ONE MASSIVE PRIZE to the lucky parental unit. See the prize list and enter to win: http://p.sf.net/sfu/thinkgeek-promo___ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null
[Nagios-users] Nagios and IBM Tivoli SRM integration
Hello to everyone, I have a simple question: Did anyone succeed integrating Nagios with this Service Request Manager tool or any other IBM Tivoli software? My goal is to open new service requests in this framework automatically for every Nagios DOWN notification. Thank you very much in advance, any suggestion will be appreciated. Francisco. ESTE MENSAJE ES CONFIDENCIAL. Puede contener información amparada por el secreto profesional. Si usted ha recibido este e-mail por error, por favor comuníquenoslo inmediatamente vía e-mail y tenga la amabilidad de eliminarlo de su sistema; no deberá copiar el mensaje ni divulgar su contenido a ninguna persona. Muchas gracias. THIS MESSAGE IS CONFIDENTIAL. It may also contain information that is privileged or otherwise legally exempt from disclosure. If you have received it by mistake please let us know by e-mail immediately and delete it from your system; should also not copy the message nor disclose its contents to anyone. Many thanks. -- ThinkGeek and WIRED's GeekDad team up for the Ultimate GeekDad Father's Day Giveaway. ONE MASSIVE PRIZE to the lucky parental unit. See the prize list and enter to win: http://p.sf.net/sfu/thinkgeek-promo___ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null
Re: [Nagios-users] check_yum issue
On Sat, Jun 5, 2010 at 9:02 AM, Kevin Keane subscript...@kkeane.com wrote: You would probably want to use sudo. Instead of having NRPE call check_yum directly, have it call sudo check_yum, and add check_yum for the Nagios user to your sudoers (make sure to not require a password, of course!) Be sure to keep the sudoers entry as restrictive as possible, or you may open a security hole. -Original Message- From: Terry [mailto:td3...@gmail.com] Sent: Thursday, June 03, 2010 11:40 AM To: nagios-users@lists.sourceforge.net Subject: Re: [Nagios-users] check_yum issue On Thu, Jun 3, 2010 at 1:28 PM, Terry td3...@gmail.com wrote: Hello, I am trying to use check_yum: http://exchange.nagios.org/directory/Plugins/Uncategorized/Operating-S ystems/Linux/Check_Yum/details It works great from the command line: [r...@foo ~]# yum --security check-update Loaded plugins: dellsysid, rhnplugin, security Limiting package lists to security relevant ones Needed 4 of 11 packages, for security rhn-check.noarch 0.4.20-33.el5_5.2 rhel-x86_64-server-5 rhn-client-tools.noarch 0.4.20-33.el5_5.2 rhel-x86_64-server-5 rhn-setup.noarch 0.4.20-33.el5_5.2 rhel-x86_64-server-5 rhn-setup-gnome.noarch 0.4.20-33.el5_5.2 rhel-x86_64-server-5 [r...@foo ~]# /usr/lib64/nagios/plugins/check_yum YUM CRITICAL: 4 Security Updates Available. 7 Non-Security Updates Available [r...@foo ~]# echo $? 2 It returns this from nagios: [r...@foo ~]# /usr/lib64/nagios/plugins/check_nrpe -H 10.0.0.2 -t 50 -c check_yum YUM OK: 0 Security Updates Available Here's my NRPE configuration: [r...@bar ~]# cat /etc/nagios/nrpe.cfg | grep check_yum command[check_yum]=/usr/lib64/nagios/plugins/check_yum What am I missing here? I think I fail here. This is a permissions issue as noted in the description of the plugin. Anyone doing something similar? If so, how is your solution architected? Thanks! -- ThinkGeek and WIRED's GeekDad team up for the Ultimate GeekDad Father's Day Giveaway. ONE MASSIVE PRIZE to the lucky parental unit. See the prize list and enter to win: http://p.sf.net/sfu/thinkgeek-promo ___ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null -- ThinkGeek and WIRED's GeekDad team up for the Ultimate GeekDad Father's Day Giveaway. ONE MASSIVE PRIZE to the lucky parental unit. See the prize list and enter to win: http://p.sf.net/sfu/thinkgeek-promo ___ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null I think I did one better, maybe. I am having nagios call check_by_ssh which uses a key that is specific for this command. On the remote side, I am configuring the authorized_hosts such as this: command=/usr/lib/nagios/plugins/check_yum ssh-rsa AA. The only thing this key can do is call check_yum on the remote end. -- ThinkGeek and WIRED's GeekDad team up for the Ultimate GeekDad Father's Day Giveaway. ONE MASSIVE PRIZE to the lucky parental unit. See the prize list and enter to win: http://p.sf.net/sfu/thinkgeek-promo ___ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null