Re: [Nagios-users] extra checkresults files being left behind

2010-06-10 Thread Mathew Walker


Nagios v3.2.0

 

And I see the check and check.ok files:

-rw--- 1 nagios nagios291 Jun  9 07:12 checkzGuzY7
-rw--- 1 nagios nagios280 Jun  7 21:54 checkzjh6PZ
-rw--- 1 nagios nagios483 Jun 10 13:07 cxHWRxJ
-rw--- 1 nagios nagios  0 Jun 10 13:07 cxHWRxJ.ok


But the check* orphan files just keep showing up.  They don't relate to a 
specific host or check.  No real pattern to time, host, service, etc.  I could 
understand if the system was hitting 100% memory or CPU... but the memory is 
pretty stable in the 50-70% used range.  Load is nearly 0.00 across the board.  
The system is pretty much dedicated to my running nagios as a test box.


-- 
Mat W. - http://www.techadre.com


 
 Date: Wed, 9 Jun 2010 20:51:35 -0700
 From: mike-nag...@5dninja.net
 To: nagios-users@lists.sourceforge.net
 Subject: Re: [Nagios-users] extra checkresults files being left behind
 
 Mathew Walker wrote:
  I'm running Nagios on a little VPS box checking a few hosts/services 
  (~50 checks). It's mostly a testing platform for me and checks in on my 
  other test VPS systems.
  
  However I keep seeing the extra check results data files build up in 
  /usr/local/nagios/var/spool/checkresults like:
  -rw--- 1 nagios nagios 249 Jun 7 23:45 checknbu01O
  -rw--- 1 nagios nagios 252 Jun 8 02:40 checkHxcsiJ
 
  Googled a bit and didn't come up with much relevant. Any thoughts?
 
 If I remember correctly, the parent nagios process writes out that file, 
 then forks a child. The child then runs the check, updates that file 
 and then creates a file with the same name, plus '.ok' in that 
 directory, letting the parent process know the check is completed.
 
 So, take a look at the contents of several of those files, if you're 
 lucky, you'll see that either they are for the same host, or the same 
 service check. If so, there might be something in the way that host or 
 service is getting polled that is causing the forked child to die.
 
 Also, if you're running a version older than 3.0rc1 (generally always a 
 good thing to include the version of the tool you're useing, when asking 
 for help) then you may want to upgrade, that version fixed a bug that 
 might be related: Fixed bug with not deleting old check result files 
 that contained results for invalid host/service
 
 -- 
 Mike Lindsey
 
 --
 ThinkGeek and WIRED's GeekDad team up for the Ultimate 
 GeekDad Father's Day Giveaway. ONE MASSIVE PRIZE to the 
 lucky parental unit. See the prize list and enter to win: 
 http://p.sf.net/sfu/thinkgeek-promo
 ___
 Nagios-users mailing list
 Nagios-users@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/nagios-users
 ::: Please include Nagios version, plugin version (-v) and OS when reporting 
 any issue. 
 ::: Messages without supporting info will risk being sent to /dev/null
  
_
The New Busy is not the old busy. Search, chat and e-mail from your inbox.
http://www.windowslive.com/campaign/thenewbusy?ocid=PID28326::T:WLMTAGL:ON:WL:en-US:WM_HMP:042010_3--
ThinkGeek and WIRED's GeekDad team up for the Ultimate 
GeekDad Father's Day Giveaway. ONE MASSIVE PRIZE to the 
lucky parental unit.  See the prize list and enter to win: 
http://p.sf.net/sfu/thinkgeek-promo___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null

[Nagios-users] extra checkresults files being left behind

2010-06-09 Thread Mathew Walker

I'm running Nagios on a little VPS box checking a few hosts/services (~50 
checks).  It's mostly a testing platform for me and checks in on my other test 
VPS systems.

 

However I keep seeing the extra check results data files build up in 
/usr/local/nagios/var/spool/checkresults like:

-rw--- 1 nagios nagios249 Jun  7 23:45 checknbu01O
-rw--- 1 nagios nagios252 Jun  8 02:40 checkHxcsiJ
-rw--- 1 nagios nagios291 Jun  8 03:52 checkcyaOva
-rw--- 1 nagios nagios280 Jun  8 04:46 checknlLs4b
-rw--- 1 nagios nagios250 Jun  8 05:52 checkCMATnr
-rw--- 1 nagios nagios285 Jun  8 06:21 checkrblxgG
-rw--- 1 nagios nagios252 Jun  8 07:30 checkikZPk8
-rw--- 1 nagios nagios285 Jun  8 09:14 check47NrJf
-rw--- 1 nagios nagios285 Jun  8 13:34 check4g81jo
-rw--- 1 nagios nagios249 Jun  8 15:15 checkvFH7JT


Some days there will be one or two, some days there will be 30-50.  The days w/ 
more entries seems to be the days with more alerts.  The files will just build 
up and build up for months if I do not manually delete them.

 

I've also seen my one server w/ a passive check, not properly update back to 
the dummy default value of OK on occassion.  

 

I've tried tweaking the various config variables like: 
max_check_result_file_age=3600, and check_result_reaper_*.  I thought it may 
have been a performance issue with my little VPS, but the memory and CPU load 
(thanks Nagiosgraph), all seem pretty flat.  My typically check interval is 
5minutes.  With only ~50 checks it shouldn't be THAT much load.

 

Googled a bit and didn't come up with much relevant.  Any thoughts?

-- 
Mat W. - http://www.techadre.com
  
_
The New Busy think 9 to 5 is a cute idea. Combine multiple calendars with 
Hotmail. 
http://www.windowslive.com/campaign/thenewbusy?tile=multicalendarocid=PID28326::T:WLMTAGL:ON:WL:en-US:WM_HMP:042010_5--
ThinkGeek and WIRED's GeekDad team up for the Ultimate 
GeekDad Father's Day Giveaway. ONE MASSIVE PRIZE to the 
lucky parental unit.  See the prize list and enter to win: 
http://p.sf.net/sfu/thinkgeek-promo___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null