Re: [Nagios-users] Nagios as a Service Resiliency Manager
On 10/12/09 12:08 PM, Christopher McAtackney wrote: Hi all, I have a need to control an Active / Passive pair of components and was wondering if anyone had tackled this problem with Nagios? The scenario is as follows; Host A has SERVICE_1 installed and running. Host B has SERVICE_2 installed, but not running. The desired functionality is to detect when SERVICE_1 is not running (or that Host A is down / unreachable), and then to start SERVICE_2 on Host B. I believe I can do this with Nagios by defining an event handler on SERVICE_1 which will make the appropriate call to start SERVICE_2 on Host B Would it make sense to store the relationship between SERVICE_1 and Host B / SERVICE_2 as a service macro, e.g. $_SERVICE_PASSIVE_HOSTNAME, $_SERVICE_PASSIVE_SERVICENAME? There are too many scenarios in which the SERVICE_1 might come back up to try automate the switching off of SERVICE_2 I believe, e.g. if someone pulled a network cable on Host A accidently, then plugged it in 15 minutes later - during which time Nagios detects that it is down and so starts up SERVICE_2. The user then plugs the network lead back in and now we have two Active instances running - which is what we specifically wanted to avoid. Even if Nagios detects that the primary component is up, it's still too late because any Active / Active overlap will cause problems for this particular application. I can't think of any way to automate that side of things - but does the general concept of having Nagios start up a Passive partner make sense? Short answer: not really. You're talking about clustering here, and clustering has its very own set of problems than Nagios was never meant to solve. You should rather spend your time looking at a real clustering solution like Linux-HA (I used this one but I know there's other OSS clustering software around...). Once you have your cluster set up then it makes sense to monitor the services *and* the cluster software using Nagios. For failover services I find the easiest way is you use a shared IP (IP that moves from one server to the other along with the services - this is very easy to add once the cluster is set-up) so you always look for the service where it's supposed to be running. If a shared IP isn't an option just monitor the service on both servers and use check_cluster to detect across all nodes. I'm not saying that you can't achieve this using Nagios... It might actually work for very simplistic scenarios but even in that case you may end up accidentally running the service on both servers if you're not very careful (something that a clustering software sill not let happen). You have to take into account not only every possible failure scenarios but also every possible thing a human could be doing at the same time your handlers try to recover the service! If kind of like reinventing the wheel, but not even using the right tools :) -- Thomas -- Return on Information: Google Enterprise Search pays you back Get the facts. http://p.sf.net/sfu/google-dev2dev ___ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null
Re: [Nagios-users] Nagios as a Service Resiliency Manager
That's an interesting link - but unfortunately I don't think it really covers the situation where a host goes down or becomes unreachable. It may be the case that Nagios is not suitable for this purpose, but I thought I would check on here in case anyone had done anything like this previously. Cheers, Chris 2009/12/10 Marcel mits...@gmail.com: Maybe this would help: http://onlamp.com/onlamp/2006/05/25/self-healing-networks.html On Thu, Dec 10, 2009 at 3:08 PM, Christopher McAtackney crist...@gmail.com wrote: Hi all, I have a need to control an Active / Passive pair of components and was wondering if anyone had tackled this problem with Nagios? The scenario is as follows; Host A has SERVICE_1 installed and running. Host B has SERVICE_2 installed, but not running. The desired functionality is to detect when SERVICE_1 is not running (or that Host A is down / unreachable), and then to start SERVICE_2 on Host B. I believe I can do this with Nagios by defining an event handler on SERVICE_1 which will make the appropriate call to start SERVICE_2 on Host B Would it make sense to store the relationship between SERVICE_1 and Host B / SERVICE_2 as a service macro, e.g. $_SERVICE_PASSIVE_HOSTNAME, $_SERVICE_PASSIVE_SERVICENAME? There are too many scenarios in which the SERVICE_1 might come back up to try automate the switching off of SERVICE_2 I believe, e.g. if someone pulled a network cable on Host A accidently, then plugged it in 15 minutes later - during which time Nagios detects that it is down and so starts up SERVICE_2. The user then plugs the network lead back in and now we have two Active instances running - which is what we specifically wanted to avoid. Even if Nagios detects that the primary component is up, it's still too late because any Active / Active overlap will cause problems for this particular application. I can't think of any way to automate that side of things - but does the general concept of having Nagios start up a Passive partner make sense? Thanks for any insight you have, Chris -- Return on Information: Google Enterprise Search pays you back Get the facts. http://p.sf.net/sfu/google-dev2dev ___ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null -- Return on Information: Google Enterprise Search pays you back Get the facts. http://p.sf.net/sfu/google-dev2dev ___ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null
Re: [Nagios-users] Nagios as a Service Resiliency Manager
Chris, great thing about Nagios is it enables creative solution like this. I'd love to see you try it and report back on how it works for you. On 12/11/09, Christopher McAtackney crist...@gmail.com wrote: That's an interesting link - but unfortunately I don't think it really covers the situation where a host goes down or becomes unreachable. It may be the case that Nagios is not suitable for this purpose, but I thought I would check on here in case anyone had done anything like this previously. Cheers, Chris 2009/12/10 Marcel mits...@gmail.com: Maybe this would help: http://onlamp.com/onlamp/2006/05/25/self-healing-networks.html On Thu, Dec 10, 2009 at 3:08 PM, Christopher McAtackney crist...@gmail.com wrote: Hi all, I have a need to control an Active / Passive pair of components and was wondering if anyone had tackled this problem with Nagios? The scenario is as follows; Host A has SERVICE_1 installed and running. Host B has SERVICE_2 installed, but not running. The desired functionality is to detect when SERVICE_1 is not running (or that Host A is down / unreachable), and then to start SERVICE_2 on Host B. I believe I can do this with Nagios by defining an event handler on SERVICE_1 which will make the appropriate call to start SERVICE_2 on Host B Would it make sense to store the relationship between SERVICE_1 and Host B / SERVICE_2 as a service macro, e.g. $_SERVICE_PASSIVE_HOSTNAME, $_SERVICE_PASSIVE_SERVICENAME? There are too many scenarios in which the SERVICE_1 might come back up to try automate the switching off of SERVICE_2 I believe, e.g. if someone pulled a network cable on Host A accidently, then plugged it in 15 minutes later - during which time Nagios detects that it is down and so starts up SERVICE_2. The user then plugs the network lead back in and now we have two Active instances running - which is what we specifically wanted to avoid. Even if Nagios detects that the primary component is up, it's still too late because any Active / Active overlap will cause problems for this particular application. I can't think of any way to automate that side of things - but does the general concept of having Nagios start up a Passive partner make sense? Thanks for any insight you have, Chris -- Return on Information: Google Enterprise Search pays you back Get the facts. http://p.sf.net/sfu/google-dev2dev ___ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null -- Return on Information: Google Enterprise Search pays you back Get the facts. http://p.sf.net/sfu/google-dev2dev ___ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null -- Sent from my mobile device \\Greg -- Return on Information: Google Enterprise Search pays you back Get the facts. http://p.sf.net/sfu/google-dev2dev ___ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null
[Nagios-users] Nagios as a Service Resiliency Manager
Hi all, I have a need to control an Active / Passive pair of components and was wondering if anyone had tackled this problem with Nagios? The scenario is as follows; Host A has SERVICE_1 installed and running. Host B has SERVICE_2 installed, but not running. The desired functionality is to detect when SERVICE_1 is not running (or that Host A is down / unreachable), and then to start SERVICE_2 on Host B. I believe I can do this with Nagios by defining an event handler on SERVICE_1 which will make the appropriate call to start SERVICE_2 on Host B Would it make sense to store the relationship between SERVICE_1 and Host B / SERVICE_2 as a service macro, e.g. $_SERVICE_PASSIVE_HOSTNAME, $_SERVICE_PASSIVE_SERVICENAME? There are too many scenarios in which the SERVICE_1 might come back up to try automate the switching off of SERVICE_2 I believe, e.g. if someone pulled a network cable on Host A accidently, then plugged it in 15 minutes later - during which time Nagios detects that it is down and so starts up SERVICE_2. The user then plugs the network lead back in and now we have two Active instances running - which is what we specifically wanted to avoid. Even if Nagios detects that the primary component is up, it's still too late because any Active / Active overlap will cause problems for this particular application. I can't think of any way to automate that side of things - but does the general concept of having Nagios start up a Passive partner make sense? Thanks for any insight you have, Chris -- Return on Information: Google Enterprise Search pays you back Get the facts. http://p.sf.net/sfu/google-dev2dev ___ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null
Re: [Nagios-users] Nagios as a Service Resiliency Manager
Maybe this would help: http://onlamp.com/onlamp/2006/05/25/self-healing-networks.html On Thu, Dec 10, 2009 at 3:08 PM, Christopher McAtackney crist...@gmail.comwrote: Hi all, I have a need to control an Active / Passive pair of components and was wondering if anyone had tackled this problem with Nagios? The scenario is as follows; Host A has SERVICE_1 installed and running. Host B has SERVICE_2 installed, but not running. The desired functionality is to detect when SERVICE_1 is not running (or that Host A is down / unreachable), and then to start SERVICE_2 on Host B. I believe I can do this with Nagios by defining an event handler on SERVICE_1 which will make the appropriate call to start SERVICE_2 on Host B Would it make sense to store the relationship between SERVICE_1 and Host B / SERVICE_2 as a service macro, e.g. $_SERVICE_PASSIVE_HOSTNAME, $_SERVICE_PASSIVE_SERVICENAME? There are too many scenarios in which the SERVICE_1 might come back up to try automate the switching off of SERVICE_2 I believe, e.g. if someone pulled a network cable on Host A accidently, then plugged it in 15 minutes later - during which time Nagios detects that it is down and so starts up SERVICE_2. The user then plugs the network lead back in and now we have two Active instances running - which is what we specifically wanted to avoid. Even if Nagios detects that the primary component is up, it's still too late because any Active / Active overlap will cause problems for this particular application. I can't think of any way to automate that side of things - but does the general concept of having Nagios start up a Passive partner make sense? Thanks for any insight you have, Chris -- Return on Information: Google Enterprise Search pays you back Get the facts. http://p.sf.net/sfu/google-dev2dev ___ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null -- Return on Information: Google Enterprise Search pays you back Get the facts. http://p.sf.net/sfu/google-dev2dev ___ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null