Re: [Nagios-users] Nagios kept from restarting after reboot by lockfile

2010-12-21 Thread Daniel Wittenberg
So are you using the actual reboot command not shutdown -r now which
is a little friendlier?  The standard nagios shutdown script should take
care of cleaning those up for you.  Otherwise putting something like:
rm -f lockfile; service nagios start
in your rc.local would take care of it.  But when you mention pid file,
are you saying the PID file is still there, or the lock file?  Since
they are different things.  Again though, if nagios it shutdown properly
you shouldn't be seeing that.

Dan

-Original Message-
From: eric.b...@barclayscapital.com
[mailto:eric.b...@barclayscapital.com] 
Sent: Monday, December 20, 2010 6:59 PM
To: nagios-users@lists.sourceforge.net
Subject: Re: [Nagios-users] Nagios kept from restarting after rebootby
lockfile

We reboot all of our hosts on a weekly basis.  I used to price myself in
keeping my boxes up as long as possible, but having spent years now
supporting mission-critical financial production applications, I'm on
board with the weekly reboots.  Lets you know early if some system or
app change is problematic.

Reboot is being done via a standard reboot command.  

I've looked around for rc scripts that might address this issue, but
haven't found any.  Got any pointers?

Regarding the rc.local solution, a) I'd prefer to solve the problem, not
just address the symptoms, and b) elsewhere in this thread I've
described the roadblocks that we have to doing anything a system level.
Yep, that's right, boys, we survive in the app developer layer within
which we do not have root on these boxes.  It's a tedious,
time-consuming, frustrating, productivity-killing endeavor to do just
about anything you can't do yourself.

Sogot any sample RC scripts, or command line params to nagios to
make it smart enough to know that the PID that is in it's PID file isn't
an active process?

Thanks.

Eric

 -Original Message-
 From: Daniel Wittenberg [mailto:daniel.wittenberg.r...@statefarm.com] 
 Sent: Monday, December 20, 2010 11:56 AM
 To: Nagios Users List
 Subject: Re: [Nagios-users] Nagios kept from restarting after 
 reboot by lockfile
 
 Couple questions
 1)  Why do you have to reboot your monitoring server weekly?
 2) How is the reboot being done?
 
 Reason I ask 2) is because the standard rc script will remove the
 lockfile when nagios is told to stop.  So if you are having 
 this problem
 is sounds like you are not doing a clean shutdown and 
 something could be
 wrong.
 
 Either way, I guess worst case one way to check for this would be put
 something like this in your /etc/rc.d/rc.local:
 rm -f /var/lock/subsys/nagios
 
 Assuming that's where your lockfile is. 
 
 Dan
 
 
 -Original Message-
 From: eric.b...@barclayscapital.com
 [mailto:eric.b...@barclayscapital.com] 
 Sent: Monday, December 20, 2010 10:16 AM
 To: eric.b...@barclayscapital.com; nagios-users@lists.sourceforge.net
 Subject: Re: [Nagios-users] Nagios kept from restarting after 
 reboot by
 lockfile
 
 Alternatively, could you recommend a good system/resource monitoring
 tool that would be able to let me know if nagios is down and 
 restart it
 automatically?
 
 _
 From:   Berg, Eric: IT (NYK)
 Sent:   Monday, December 20, 2010 11:03 AM
 To: 'nagios-users@lists.sourceforge.net'
 Subject:Nagios kept from restarting after reboot by lock file
 
 Gee, this seems like an annoying newbie problem, but if Nagios crashes
 or is killed (as on system reboot), it leaves a lock file around that
 prevents it from starting again until the lock file is 
 manually removed.
 
 I see this on Monday mornings after weekend reboots on a Red Hat Linux
 box:
 
 nagios: Lockfile '/home/nagios/nagios/var/nagios.lock' looks like its
 already held by another instance of Nagios (PID 0).  Bailing out...
 
 Does anyone know if there's a config option or something else that
 obviates the need to write a wrapper scropt to check to see 
 if Nagios is
 really running and remove the lock file (look slike Nagios 
 already knows
 it's not running by virtue of the value of the PID inthis 
 very message!)
 so that it can cleanly start up again?
 
 Thanks.
 
 Eric
 
 ___
 
 This e-mail may contain information that is confidential, 
 privileged or
 otherwise protected from disclosure. If you are not an intended
 recipient of this e-mail, do not duplicate or redistribute it by any
 means. Please delete it and any attachments and notify the sender that
 you have received it in error. Unless specifically indicated, this
 e-mail is not an offer to buy or sell or a solicitation to buy or sell
 any securities, investment products or other financial product or
 service, an official confirmation of any transaction, or an official
 statement of Barclays. Any views or opinions presented are 
 solely those
 of the author and do not necessarily represent those of Barclays. This
 e-mail is subject to terms available at the following link

Re: [Nagios-users] Nagios kept from restarting after reboot by lockfile

2010-12-21 Thread eric.berg
Good stuff, Dan.  I was not aware of the differences between how the reboot and 
shutdown commands handle the reboot process.

Turns out that we're doing a reboot -f, which explains why I have orphaned PID 
files laying around.

I'm going to make the call right now that to fight the fight to have 'reboot 
-f' changed to the plays-more-nicely-with-others shutdown -r is already lost 
and I'm going to work around that in code.

Thanks for helping clarify this.  

It's weirdwhen I run nagios and kill it with -9, it leaves the pid file in 
tact, but when I restart it, it zero's out the pid file and starts just fine.  
when I just kill it with the default kill signal, it removes the pid file.

In any case, I now know what the issues are and how to address this.  Thanks 
again very much for you help, guys.  You are a feature of Nagios.

Eric

 -Original Message-
 From: Daniel Wittenberg [mailto:daniel.wittenberg.r...@statefarm.com] 
 Sent: Tuesday, December 21, 2010 9:23 AM
 To: Nagios Users List
 Subject: Re: [Nagios-users] Nagios kept from restarting after 
 reboot by lockfile
 
 So are you using the actual reboot command not shutdown -r 
 now which
 is a little friendlier?  The standard nagios shutdown script 
 should take
 care of cleaning those up for you.  Otherwise putting something like:
 rm -f lockfile; service nagios start
 in your rc.local would take care of it.  But when you mention 
 pid file,
 are you saying the PID file is still there, or the lock file?  Since
 they are different things.  Again though, if nagios it 
 shutdown properly
 you shouldn't be seeing that.
 
 Dan
 
 -Original Message-
 From: eric.b...@barclayscapital.com
 [mailto:eric.b...@barclayscapital.com] 
 Sent: Monday, December 20, 2010 6:59 PM
 To: nagios-users@lists.sourceforge.net
 Subject: Re: [Nagios-users] Nagios kept from restarting after rebootby
 lockfile
 
 We reboot all of our hosts on a weekly basis.  I used to 
 price myself in
 keeping my boxes up as long as possible, but having spent years now
 supporting mission-critical financial production applications, I'm on
 board with the weekly reboots.  Lets you know early if some system or
 app change is problematic.
 
 Reboot is being done via a standard reboot command.  
 
 I've looked around for rc scripts that might address this issue, but
 haven't found any.  Got any pointers?
 
 Regarding the rc.local solution, a) I'd prefer to solve the 
 problem, not
 just address the symptoms, and b) elsewhere in this thread I've
 described the roadblocks that we have to doing anything a 
 system level.
 Yep, that's right, boys, we survive in the app developer layer within
 which we do not have root on these boxes.  It's a tedious,
 time-consuming, frustrating, productivity-killing endeavor to do just
 about anything you can't do yourself.
 
 Sogot any sample RC scripts, or command line params to nagios to
 make it smart enough to know that the PID that is in it's PID 
 file isn't
 an active process?
 
 Thanks.
 
 Eric
 
  -Original Message-
  From: Daniel Wittenberg 
 [mailto:daniel.wittenberg.r...@statefarm.com] 
  Sent: Monday, December 20, 2010 11:56 AM
  To: Nagios Users List
  Subject: Re: [Nagios-users] Nagios kept from restarting after 
  reboot by lockfile
  
  Couple questions
  1)  Why do you have to reboot your monitoring server weekly?
  2) How is the reboot being done?
  
  Reason I ask 2) is because the standard rc script will remove the
  lockfile when nagios is told to stop.  So if you are having 
  this problem
  is sounds like you are not doing a clean shutdown and 
  something could be
  wrong.
  
  Either way, I guess worst case one way to check for this 
 would be put
  something like this in your /etc/rc.d/rc.local:
  rm -f /var/lock/subsys/nagios
  
  Assuming that's where your lockfile is. 
  
  Dan
  
  
  -Original Message-
  From: eric.b...@barclayscapital.com
  [mailto:eric.b...@barclayscapital.com] 
  Sent: Monday, December 20, 2010 10:16 AM
  To: eric.b...@barclayscapital.com; 
 nagios-users@lists.sourceforge.net
  Subject: Re: [Nagios-users] Nagios kept from restarting after 
  reboot by
  lockfile
  
  Alternatively, could you recommend a good system/resource monitoring
  tool that would be able to let me know if nagios is down and 
  restart it
  automatically?
  
  _
  From:   Berg, Eric: IT (NYK)
  Sent:   Monday, December 20, 2010 11:03 AM
  To: 'nagios-users@lists.sourceforge.net'
  Subject:Nagios kept from restarting after reboot by 
 lock file
  
  Gee, this seems like an annoying newbie problem, but if 
 Nagios crashes
  or is killed (as on system reboot), it leaves a lock file 
 around that
  prevents it from starting again until the lock file is 
  manually removed.
  
  I see this on Monday mornings after weekend reboots on a 
 Red Hat Linux
  box:
  
  nagios: Lockfile '/home/nagios/nagios/var/nagios.lock' 
 looks like its
  already held

Re: [Nagios-users] Nagios kept from restarting after reboot by lockfile

2010-12-21 Thread Paul M. Dubuc
eric.b...@barclayscapital.com wrote:


 It's weirdwhen I run nagios and kill it with -9, it leaves the pid
 file  intact, but when I restart it, it zero's out the pid file and starts
  just fine. when I just kill it with the default kill signal, it removes the
  pid file.

This isn't weird.  That's how it should work.  kill -9 sends an uncatchable, 
compulsory, kill signal (SIGKILL) to the process giving it no time to clean up 
before exiting.  The default kill signal is SIGTERM, which can be caught and 
handled (or ignored) by the process.  Restarting Nagios from the web 
interface, doesn't terminate and restart the process (the PID doesn't change), 
only re-initializes it.

--
Forrester recently released a report on the Return on Investment (ROI) of
Google Apps. They found a 300% ROI, 38%-56% cost savings, and break-even
within 7 months.  Over 3 million businesses have gone Google with Google Apps:
an online email calendar, and document program that's accessible from your 
browser. Read the Forrester report: http://p.sf.net/sfu/googleapps-sfnew
___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null


Re: [Nagios-users] Nagios kept from restarting after reboot by lockfile

2010-12-20 Thread Daniel Wittenberg
Couple questions
1)  Why do you have to reboot your monitoring server weekly?
2) How is the reboot being done?

Reason I ask 2) is because the standard rc script will remove the
lockfile when nagios is told to stop.  So if you are having this problem
is sounds like you are not doing a clean shutdown and something could be
wrong.

Either way, I guess worst case one way to check for this would be put
something like this in your /etc/rc.d/rc.local:
rm -f /var/lock/subsys/nagios

Assuming that's where your lockfile is. 

Dan


-Original Message-
From: eric.b...@barclayscapital.com
[mailto:eric.b...@barclayscapital.com] 
Sent: Monday, December 20, 2010 10:16 AM
To: eric.b...@barclayscapital.com; nagios-users@lists.sourceforge.net
Subject: Re: [Nagios-users] Nagios kept from restarting after reboot by
lockfile

Alternatively, could you recommend a good system/resource monitoring
tool that would be able to let me know if nagios is down and restart it
automatically?

_
From:   Berg, Eric: IT (NYK)
Sent:   Monday, December 20, 2010 11:03 AM
To: 'nagios-users@lists.sourceforge.net'
Subject:Nagios kept from restarting after reboot by lock file

Gee, this seems like an annoying newbie problem, but if Nagios crashes
or is killed (as on system reboot), it leaves a lock file around that
prevents it from starting again until the lock file is manually removed.

I see this on Monday mornings after weekend reboots on a Red Hat Linux
box:

nagios: Lockfile '/home/nagios/nagios/var/nagios.lock' looks like its
already held by another instance of Nagios (PID 0).  Bailing out...

Does anyone know if there's a config option or something else that
obviates the need to write a wrapper scropt to check to see if Nagios is
really running and remove the lock file (look slike Nagios already knows
it's not running by virtue of the value of the PID inthis very message!)
so that it can cleanly start up again?

Thanks.

Eric

___

This e-mail may contain information that is confidential, privileged or
otherwise protected from disclosure. If you are not an intended
recipient of this e-mail, do not duplicate or redistribute it by any
means. Please delete it and any attachments and notify the sender that
you have received it in error. Unless specifically indicated, this
e-mail is not an offer to buy or sell or a solicitation to buy or sell
any securities, investment products or other financial product or
service, an official confirmation of any transaction, or an official
statement of Barclays. Any views or opinions presented are solely those
of the author and do not necessarily represent those of Barclays. This
e-mail is subject to terms available at the following link:
www.barcap.com/emaildisclaimer. By messaging with Barclays you consent
to the foregoing.  Barclays Capital is the investment banking division
of Barclays Bank PLC, a company registered in England (number 1026167)
with its registered offic
 e at 1 Churchill Place, London, E14 5HP.  This email may relate to or
be sent from other members of the Barclays Group.
___


--
Lotusphere 2011
Register now for Lotusphere 2011 and learn how
to connect the dots, take your collaborative environment
to the next level, and enter the era of Social Business.
http://p.sf.net/sfu/lotusphere-d2d
___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when
reporting any issue. 
::: Messages without supporting info will risk being sent to /dev/null

--
Lotusphere 2011
Register now for Lotusphere 2011 and learn how
to connect the dots, take your collaborative environment
to the next level, and enter the era of Social Business.
http://p.sf.net/sfu/lotusphere-d2d
___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null


Re: [Nagios-users] Nagios kept from restarting after reboot by lockfile

2010-12-20 Thread eric.berg
We reboot all of our hosts on a weekly basis.  I used to price myself in 
keeping my boxes up as long as possible, but having spent years now supporting 
mission-critical financial production applications, I'm on board with the 
weekly reboots.  Lets you know early if some system or app change is 
problematic.

Reboot is being done via a standard reboot command.  

I've looked around for rc scripts that might address this issue, but haven't 
found any.  Got any pointers?

Regarding the rc.local solution, a) I'd prefer to solve the problem, not just 
address the symptoms, and b) elsewhere in this thread I've described the 
roadblocks that we have to doing anything a system level.  Yep, that's right, 
boys, we survive in the app developer layer within which we do not have root on 
these boxes.  It's a tedious, time-consuming, frustrating, productivity-killing 
endeavor to do just about anything you can't do yourself.

Sogot any sample RC scripts, or command line params to nagios to make it 
smart enough to know that the PID that is in it's PID file isn't an active 
process?

Thanks.

Eric

 -Original Message-
 From: Daniel Wittenberg [mailto:daniel.wittenberg.r...@statefarm.com] 
 Sent: Monday, December 20, 2010 11:56 AM
 To: Nagios Users List
 Subject: Re: [Nagios-users] Nagios kept from restarting after 
 reboot by lockfile
 
 Couple questions
 1)  Why do you have to reboot your monitoring server weekly?
 2) How is the reboot being done?
 
 Reason I ask 2) is because the standard rc script will remove the
 lockfile when nagios is told to stop.  So if you are having 
 this problem
 is sounds like you are not doing a clean shutdown and 
 something could be
 wrong.
 
 Either way, I guess worst case one way to check for this would be put
 something like this in your /etc/rc.d/rc.local:
 rm -f /var/lock/subsys/nagios
 
 Assuming that's where your lockfile is. 
 
 Dan
 
 
 -Original Message-
 From: eric.b...@barclayscapital.com
 [mailto:eric.b...@barclayscapital.com] 
 Sent: Monday, December 20, 2010 10:16 AM
 To: eric.b...@barclayscapital.com; nagios-users@lists.sourceforge.net
 Subject: Re: [Nagios-users] Nagios kept from restarting after 
 reboot by
 lockfile
 
 Alternatively, could you recommend a good system/resource monitoring
 tool that would be able to let me know if nagios is down and 
 restart it
 automatically?
 
 _
 From:   Berg, Eric: IT (NYK)
 Sent:   Monday, December 20, 2010 11:03 AM
 To: 'nagios-users@lists.sourceforge.net'
 Subject:Nagios kept from restarting after reboot by lock file
 
 Gee, this seems like an annoying newbie problem, but if Nagios crashes
 or is killed (as on system reboot), it leaves a lock file around that
 prevents it from starting again until the lock file is 
 manually removed.
 
 I see this on Monday mornings after weekend reboots on a Red Hat Linux
 box:
 
 nagios: Lockfile '/home/nagios/nagios/var/nagios.lock' looks like its
 already held by another instance of Nagios (PID 0).  Bailing out...
 
 Does anyone know if there's a config option or something else that
 obviates the need to write a wrapper scropt to check to see 
 if Nagios is
 really running and remove the lock file (look slike Nagios 
 already knows
 it's not running by virtue of the value of the PID inthis 
 very message!)
 so that it can cleanly start up again?
 
 Thanks.
 
 Eric
 
 ___
 
 This e-mail may contain information that is confidential, 
 privileged or
 otherwise protected from disclosure. If you are not an intended
 recipient of this e-mail, do not duplicate or redistribute it by any
 means. Please delete it and any attachments and notify the sender that
 you have received it in error. Unless specifically indicated, this
 e-mail is not an offer to buy or sell or a solicitation to buy or sell
 any securities, investment products or other financial product or
 service, an official confirmation of any transaction, or an official
 statement of Barclays. Any views or opinions presented are 
 solely those
 of the author and do not necessarily represent those of Barclays. This
 e-mail is subject to terms available at the following link:
 www.barcap.com/emaildisclaimer. By messaging with Barclays you consent
 to the foregoing.  Barclays Capital is the investment banking division
 of Barclays Bank PLC, a company registered in England (number 1026167)
 with its registered offic
  e at 1 Churchill Place, London, E14 5HP.  This email may relate to or
 be sent from other members of the Barclays Group.
 ___
 
 --
 --
 --
 Lotusphere 2011
 Register now for Lotusphere 2011 and learn how
 to connect the dots, take your collaborative environment
 to the next level, and enter the era of Social Business.
 http://p.sf.net/sfu/lotusphere-d2d

Re: [Nagios-users] Nagios kept from restarting after reboot by lockfile

2010-12-20 Thread Andreas Ericsson
On 12/21/2010 01:58 AM, eric.b...@barclayscapital.com wrote:
 We reboot all of our hosts on a weekly basis.  I used to price myself in 
 keeping my boxes up as long as possible, but having spent years now 
 supporting mission-critical financial production applications, I'm on board 
 with the weekly reboots.  Lets you know early if some system or app change is 
 problematic.
 
 Reboot is being done via a standard reboot command.
 
 I've looked around for rc scripts that might address this issue, but haven't 
 found any.  Got any pointers?
 
 Regarding the rc.local solution, a) I'd prefer to solve the problem, not just 
 address the symptoms, and b) elsewhere in this thread I've described the 
 roadblocks that we have to doing anything a system level.  Yep, that's right, 
 boys, we survive in the app developer layer within which we do not have root 
 on these boxes.  It's a tedious, time-consuming, frustrating, 
 productivity-killing endeavor to do just about anything you can't do yourself.
 
 Sogot any sample RC scripts, or command line params to nagios to make it 
 smart enough to know that the PID that is in it's PID file isn't an active 
 process?
 

Depending on what system tools you've got installed, this should work
decently. Set variables to proper values and add it to the top of your
init script.

pid=$(cat $lockfile)
kill -0 $pid || rm -f $lockfile
nagiospid=$(pidof nagios | sed 's/.* //')
test $pid = $nagiospid || rm -f $lockfile

-- 
Andreas Ericsson   andreas.erics...@op5.se
OP5 AB www.op5.se
Tel: +46 8-230225  Fax: +46 8-230231

Considering the successes of the wars on alcohol, poverty, drugs and
terror, I think we should give some serious thought to declaring war
on peace.

--
Lotusphere 2011
Register now for Lotusphere 2011 and learn how
to connect the dots, take your collaborative environment
to the next level, and enter the era of Social Business.
http://p.sf.net/sfu/lotusphere-d2d
___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null