Re: [Nagios-users] Nagios kept from restarting after reboot by lockfile
So are you using the actual reboot command not shutdown -r now which is a little friendlier? The standard nagios shutdown script should take care of cleaning those up for you. Otherwise putting something like: rm -f lockfile; service nagios start in your rc.local would take care of it. But when you mention pid file, are you saying the PID file is still there, or the lock file? Since they are different things. Again though, if nagios it shutdown properly you shouldn't be seeing that. Dan -Original Message- From: eric.b...@barclayscapital.com [mailto:eric.b...@barclayscapital.com] Sent: Monday, December 20, 2010 6:59 PM To: nagios-users@lists.sourceforge.net Subject: Re: [Nagios-users] Nagios kept from restarting after rebootby lockfile We reboot all of our hosts on a weekly basis. I used to price myself in keeping my boxes up as long as possible, but having spent years now supporting mission-critical financial production applications, I'm on board with the weekly reboots. Lets you know early if some system or app change is problematic. Reboot is being done via a standard reboot command. I've looked around for rc scripts that might address this issue, but haven't found any. Got any pointers? Regarding the rc.local solution, a) I'd prefer to solve the problem, not just address the symptoms, and b) elsewhere in this thread I've described the roadblocks that we have to doing anything a system level. Yep, that's right, boys, we survive in the app developer layer within which we do not have root on these boxes. It's a tedious, time-consuming, frustrating, productivity-killing endeavor to do just about anything you can't do yourself. Sogot any sample RC scripts, or command line params to nagios to make it smart enough to know that the PID that is in it's PID file isn't an active process? Thanks. Eric -Original Message- From: Daniel Wittenberg [mailto:daniel.wittenberg.r...@statefarm.com] Sent: Monday, December 20, 2010 11:56 AM To: Nagios Users List Subject: Re: [Nagios-users] Nagios kept from restarting after reboot by lockfile Couple questions 1) Why do you have to reboot your monitoring server weekly? 2) How is the reboot being done? Reason I ask 2) is because the standard rc script will remove the lockfile when nagios is told to stop. So if you are having this problem is sounds like you are not doing a clean shutdown and something could be wrong. Either way, I guess worst case one way to check for this would be put something like this in your /etc/rc.d/rc.local: rm -f /var/lock/subsys/nagios Assuming that's where your lockfile is. Dan -Original Message- From: eric.b...@barclayscapital.com [mailto:eric.b...@barclayscapital.com] Sent: Monday, December 20, 2010 10:16 AM To: eric.b...@barclayscapital.com; nagios-users@lists.sourceforge.net Subject: Re: [Nagios-users] Nagios kept from restarting after reboot by lockfile Alternatively, could you recommend a good system/resource monitoring tool that would be able to let me know if nagios is down and restart it automatically? _ From: Berg, Eric: IT (NYK) Sent: Monday, December 20, 2010 11:03 AM To: 'nagios-users@lists.sourceforge.net' Subject:Nagios kept from restarting after reboot by lock file Gee, this seems like an annoying newbie problem, but if Nagios crashes or is killed (as on system reboot), it leaves a lock file around that prevents it from starting again until the lock file is manually removed. I see this on Monday mornings after weekend reboots on a Red Hat Linux box: nagios: Lockfile '/home/nagios/nagios/var/nagios.lock' looks like its already held by another instance of Nagios (PID 0). Bailing out... Does anyone know if there's a config option or something else that obviates the need to write a wrapper scropt to check to see if Nagios is really running and remove the lock file (look slike Nagios already knows it's not running by virtue of the value of the PID inthis very message!) so that it can cleanly start up again? Thanks. Eric ___ This e-mail may contain information that is confidential, privileged or otherwise protected from disclosure. If you are not an intended recipient of this e-mail, do not duplicate or redistribute it by any means. Please delete it and any attachments and notify the sender that you have received it in error. Unless specifically indicated, this e-mail is not an offer to buy or sell or a solicitation to buy or sell any securities, investment products or other financial product or service, an official confirmation of any transaction, or an official statement of Barclays. Any views or opinions presented are solely those of the author and do not necessarily represent those of Barclays. This e-mail is subject to terms available at the following link
Re: [Nagios-users] Nagios kept from restarting after reboot by lockfile
Good stuff, Dan. I was not aware of the differences between how the reboot and shutdown commands handle the reboot process. Turns out that we're doing a reboot -f, which explains why I have orphaned PID files laying around. I'm going to make the call right now that to fight the fight to have 'reboot -f' changed to the plays-more-nicely-with-others shutdown -r is already lost and I'm going to work around that in code. Thanks for helping clarify this. It's weirdwhen I run nagios and kill it with -9, it leaves the pid file in tact, but when I restart it, it zero's out the pid file and starts just fine. when I just kill it with the default kill signal, it removes the pid file. In any case, I now know what the issues are and how to address this. Thanks again very much for you help, guys. You are a feature of Nagios. Eric -Original Message- From: Daniel Wittenberg [mailto:daniel.wittenberg.r...@statefarm.com] Sent: Tuesday, December 21, 2010 9:23 AM To: Nagios Users List Subject: Re: [Nagios-users] Nagios kept from restarting after reboot by lockfile So are you using the actual reboot command not shutdown -r now which is a little friendlier? The standard nagios shutdown script should take care of cleaning those up for you. Otherwise putting something like: rm -f lockfile; service nagios start in your rc.local would take care of it. But when you mention pid file, are you saying the PID file is still there, or the lock file? Since they are different things. Again though, if nagios it shutdown properly you shouldn't be seeing that. Dan -Original Message- From: eric.b...@barclayscapital.com [mailto:eric.b...@barclayscapital.com] Sent: Monday, December 20, 2010 6:59 PM To: nagios-users@lists.sourceforge.net Subject: Re: [Nagios-users] Nagios kept from restarting after rebootby lockfile We reboot all of our hosts on a weekly basis. I used to price myself in keeping my boxes up as long as possible, but having spent years now supporting mission-critical financial production applications, I'm on board with the weekly reboots. Lets you know early if some system or app change is problematic. Reboot is being done via a standard reboot command. I've looked around for rc scripts that might address this issue, but haven't found any. Got any pointers? Regarding the rc.local solution, a) I'd prefer to solve the problem, not just address the symptoms, and b) elsewhere in this thread I've described the roadblocks that we have to doing anything a system level. Yep, that's right, boys, we survive in the app developer layer within which we do not have root on these boxes. It's a tedious, time-consuming, frustrating, productivity-killing endeavor to do just about anything you can't do yourself. Sogot any sample RC scripts, or command line params to nagios to make it smart enough to know that the PID that is in it's PID file isn't an active process? Thanks. Eric -Original Message- From: Daniel Wittenberg [mailto:daniel.wittenberg.r...@statefarm.com] Sent: Monday, December 20, 2010 11:56 AM To: Nagios Users List Subject: Re: [Nagios-users] Nagios kept from restarting after reboot by lockfile Couple questions 1) Why do you have to reboot your monitoring server weekly? 2) How is the reboot being done? Reason I ask 2) is because the standard rc script will remove the lockfile when nagios is told to stop. So if you are having this problem is sounds like you are not doing a clean shutdown and something could be wrong. Either way, I guess worst case one way to check for this would be put something like this in your /etc/rc.d/rc.local: rm -f /var/lock/subsys/nagios Assuming that's where your lockfile is. Dan -Original Message- From: eric.b...@barclayscapital.com [mailto:eric.b...@barclayscapital.com] Sent: Monday, December 20, 2010 10:16 AM To: eric.b...@barclayscapital.com; nagios-users@lists.sourceforge.net Subject: Re: [Nagios-users] Nagios kept from restarting after reboot by lockfile Alternatively, could you recommend a good system/resource monitoring tool that would be able to let me know if nagios is down and restart it automatically? _ From: Berg, Eric: IT (NYK) Sent: Monday, December 20, 2010 11:03 AM To: 'nagios-users@lists.sourceforge.net' Subject:Nagios kept from restarting after reboot by lock file Gee, this seems like an annoying newbie problem, but if Nagios crashes or is killed (as on system reboot), it leaves a lock file around that prevents it from starting again until the lock file is manually removed. I see this on Monday mornings after weekend reboots on a Red Hat Linux box: nagios: Lockfile '/home/nagios/nagios/var/nagios.lock' looks like its already held
Re: [Nagios-users] Nagios kept from restarting after reboot by lockfile
eric.b...@barclayscapital.com wrote: It's weirdwhen I run nagios and kill it with -9, it leaves the pid file intact, but when I restart it, it zero's out the pid file and starts just fine. when I just kill it with the default kill signal, it removes the pid file. This isn't weird. That's how it should work. kill -9 sends an uncatchable, compulsory, kill signal (SIGKILL) to the process giving it no time to clean up before exiting. The default kill signal is SIGTERM, which can be caught and handled (or ignored) by the process. Restarting Nagios from the web interface, doesn't terminate and restart the process (the PID doesn't change), only re-initializes it. -- Forrester recently released a report on the Return on Investment (ROI) of Google Apps. They found a 300% ROI, 38%-56% cost savings, and break-even within 7 months. Over 3 million businesses have gone Google with Google Apps: an online email calendar, and document program that's accessible from your browser. Read the Forrester report: http://p.sf.net/sfu/googleapps-sfnew ___ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null
Re: [Nagios-users] Nagios kept from restarting after reboot by lockfile
Couple questions 1) Why do you have to reboot your monitoring server weekly? 2) How is the reboot being done? Reason I ask 2) is because the standard rc script will remove the lockfile when nagios is told to stop. So if you are having this problem is sounds like you are not doing a clean shutdown and something could be wrong. Either way, I guess worst case one way to check for this would be put something like this in your /etc/rc.d/rc.local: rm -f /var/lock/subsys/nagios Assuming that's where your lockfile is. Dan -Original Message- From: eric.b...@barclayscapital.com [mailto:eric.b...@barclayscapital.com] Sent: Monday, December 20, 2010 10:16 AM To: eric.b...@barclayscapital.com; nagios-users@lists.sourceforge.net Subject: Re: [Nagios-users] Nagios kept from restarting after reboot by lockfile Alternatively, could you recommend a good system/resource monitoring tool that would be able to let me know if nagios is down and restart it automatically? _ From: Berg, Eric: IT (NYK) Sent: Monday, December 20, 2010 11:03 AM To: 'nagios-users@lists.sourceforge.net' Subject:Nagios kept from restarting after reboot by lock file Gee, this seems like an annoying newbie problem, but if Nagios crashes or is killed (as on system reboot), it leaves a lock file around that prevents it from starting again until the lock file is manually removed. I see this on Monday mornings after weekend reboots on a Red Hat Linux box: nagios: Lockfile '/home/nagios/nagios/var/nagios.lock' looks like its already held by another instance of Nagios (PID 0). Bailing out... Does anyone know if there's a config option or something else that obviates the need to write a wrapper scropt to check to see if Nagios is really running and remove the lock file (look slike Nagios already knows it's not running by virtue of the value of the PID inthis very message!) so that it can cleanly start up again? Thanks. Eric ___ This e-mail may contain information that is confidential, privileged or otherwise protected from disclosure. If you are not an intended recipient of this e-mail, do not duplicate or redistribute it by any means. Please delete it and any attachments and notify the sender that you have received it in error. Unless specifically indicated, this e-mail is not an offer to buy or sell or a solicitation to buy or sell any securities, investment products or other financial product or service, an official confirmation of any transaction, or an official statement of Barclays. Any views or opinions presented are solely those of the author and do not necessarily represent those of Barclays. This e-mail is subject to terms available at the following link: www.barcap.com/emaildisclaimer. By messaging with Barclays you consent to the foregoing. Barclays Capital is the investment banking division of Barclays Bank PLC, a company registered in England (number 1026167) with its registered offic e at 1 Churchill Place, London, E14 5HP. This email may relate to or be sent from other members of the Barclays Group. ___ -- Lotusphere 2011 Register now for Lotusphere 2011 and learn how to connect the dots, take your collaborative environment to the next level, and enter the era of Social Business. http://p.sf.net/sfu/lotusphere-d2d ___ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null -- Lotusphere 2011 Register now for Lotusphere 2011 and learn how to connect the dots, take your collaborative environment to the next level, and enter the era of Social Business. http://p.sf.net/sfu/lotusphere-d2d ___ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null
Re: [Nagios-users] Nagios kept from restarting after reboot by lockfile
We reboot all of our hosts on a weekly basis. I used to price myself in keeping my boxes up as long as possible, but having spent years now supporting mission-critical financial production applications, I'm on board with the weekly reboots. Lets you know early if some system or app change is problematic. Reboot is being done via a standard reboot command. I've looked around for rc scripts that might address this issue, but haven't found any. Got any pointers? Regarding the rc.local solution, a) I'd prefer to solve the problem, not just address the symptoms, and b) elsewhere in this thread I've described the roadblocks that we have to doing anything a system level. Yep, that's right, boys, we survive in the app developer layer within which we do not have root on these boxes. It's a tedious, time-consuming, frustrating, productivity-killing endeavor to do just about anything you can't do yourself. Sogot any sample RC scripts, or command line params to nagios to make it smart enough to know that the PID that is in it's PID file isn't an active process? Thanks. Eric -Original Message- From: Daniel Wittenberg [mailto:daniel.wittenberg.r...@statefarm.com] Sent: Monday, December 20, 2010 11:56 AM To: Nagios Users List Subject: Re: [Nagios-users] Nagios kept from restarting after reboot by lockfile Couple questions 1) Why do you have to reboot your monitoring server weekly? 2) How is the reboot being done? Reason I ask 2) is because the standard rc script will remove the lockfile when nagios is told to stop. So if you are having this problem is sounds like you are not doing a clean shutdown and something could be wrong. Either way, I guess worst case one way to check for this would be put something like this in your /etc/rc.d/rc.local: rm -f /var/lock/subsys/nagios Assuming that's where your lockfile is. Dan -Original Message- From: eric.b...@barclayscapital.com [mailto:eric.b...@barclayscapital.com] Sent: Monday, December 20, 2010 10:16 AM To: eric.b...@barclayscapital.com; nagios-users@lists.sourceforge.net Subject: Re: [Nagios-users] Nagios kept from restarting after reboot by lockfile Alternatively, could you recommend a good system/resource monitoring tool that would be able to let me know if nagios is down and restart it automatically? _ From: Berg, Eric: IT (NYK) Sent: Monday, December 20, 2010 11:03 AM To: 'nagios-users@lists.sourceforge.net' Subject:Nagios kept from restarting after reboot by lock file Gee, this seems like an annoying newbie problem, but if Nagios crashes or is killed (as on system reboot), it leaves a lock file around that prevents it from starting again until the lock file is manually removed. I see this on Monday mornings after weekend reboots on a Red Hat Linux box: nagios: Lockfile '/home/nagios/nagios/var/nagios.lock' looks like its already held by another instance of Nagios (PID 0). Bailing out... Does anyone know if there's a config option or something else that obviates the need to write a wrapper scropt to check to see if Nagios is really running and remove the lock file (look slike Nagios already knows it's not running by virtue of the value of the PID inthis very message!) so that it can cleanly start up again? Thanks. Eric ___ This e-mail may contain information that is confidential, privileged or otherwise protected from disclosure. If you are not an intended recipient of this e-mail, do not duplicate or redistribute it by any means. Please delete it and any attachments and notify the sender that you have received it in error. Unless specifically indicated, this e-mail is not an offer to buy or sell or a solicitation to buy or sell any securities, investment products or other financial product or service, an official confirmation of any transaction, or an official statement of Barclays. Any views or opinions presented are solely those of the author and do not necessarily represent those of Barclays. This e-mail is subject to terms available at the following link: www.barcap.com/emaildisclaimer. By messaging with Barclays you consent to the foregoing. Barclays Capital is the investment banking division of Barclays Bank PLC, a company registered in England (number 1026167) with its registered offic e at 1 Churchill Place, London, E14 5HP. This email may relate to or be sent from other members of the Barclays Group. ___ -- -- -- Lotusphere 2011 Register now for Lotusphere 2011 and learn how to connect the dots, take your collaborative environment to the next level, and enter the era of Social Business. http://p.sf.net/sfu/lotusphere-d2d
Re: [Nagios-users] Nagios kept from restarting after reboot by lockfile
On 12/21/2010 01:58 AM, eric.b...@barclayscapital.com wrote: We reboot all of our hosts on a weekly basis. I used to price myself in keeping my boxes up as long as possible, but having spent years now supporting mission-critical financial production applications, I'm on board with the weekly reboots. Lets you know early if some system or app change is problematic. Reboot is being done via a standard reboot command. I've looked around for rc scripts that might address this issue, but haven't found any. Got any pointers? Regarding the rc.local solution, a) I'd prefer to solve the problem, not just address the symptoms, and b) elsewhere in this thread I've described the roadblocks that we have to doing anything a system level. Yep, that's right, boys, we survive in the app developer layer within which we do not have root on these boxes. It's a tedious, time-consuming, frustrating, productivity-killing endeavor to do just about anything you can't do yourself. Sogot any sample RC scripts, or command line params to nagios to make it smart enough to know that the PID that is in it's PID file isn't an active process? Depending on what system tools you've got installed, this should work decently. Set variables to proper values and add it to the top of your init script. pid=$(cat $lockfile) kill -0 $pid || rm -f $lockfile nagiospid=$(pidof nagios | sed 's/.* //') test $pid = $nagiospid || rm -f $lockfile -- Andreas Ericsson andreas.erics...@op5.se OP5 AB www.op5.se Tel: +46 8-230225 Fax: +46 8-230231 Considering the successes of the wars on alcohol, poverty, drugs and terror, I think we should give some serious thought to declaring war on peace. -- Lotusphere 2011 Register now for Lotusphere 2011 and learn how to connect the dots, take your collaborative environment to the next level, and enter the era of Social Business. http://p.sf.net/sfu/lotusphere-d2d ___ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null