Re: [Nagios-users] Nagios as a Service Resiliency Manager

2009-12-13 Thread Thomas Guyot-Sionnest
On 10/12/09 12:08 PM, Christopher McAtackney wrote:
 Hi all,
 
 I have a need to control an Active / Passive pair of components and
 was wondering if anyone had tackled this problem with Nagios?
 
 The scenario is as follows;
 
 Host A has SERVICE_1 installed and running. Host B has SERVICE_2
 installed, but not running.
 
 The desired functionality is to detect when SERVICE_1 is not running
 (or that Host A is down / unreachable), and then to start SERVICE_2 on
 Host B.
 
 I believe I can do this with Nagios by defining an event handler on
 SERVICE_1 which will make the appropriate call to start SERVICE_2 on
 Host B
 
 Would it make sense to store the relationship between SERVICE_1 and
 Host B / SERVICE_2 as a service macro, e.g.
 $_SERVICE_PASSIVE_HOSTNAME, $_SERVICE_PASSIVE_SERVICENAME?
 
 There are too many scenarios in which the SERVICE_1 might come back up
 to try automate the switching off of SERVICE_2 I believe, e.g. if
 someone pulled a network cable on Host A accidently, then plugged it
 in 15 minutes later - during which time Nagios detects that it is down
 and so starts up SERVICE_2. The user then plugs the network lead back
 in and now we have two Active instances running - which is what we
 specifically wanted to avoid. Even if Nagios detects that the primary
 component is up, it's still too late because any Active / Active
 overlap will cause problems for this particular application.
 
 I can't think of any way to automate that side of things - but does
 the general concept of having Nagios start up a Passive partner make
 sense?

Short answer: not really.

You're talking about clustering here, and clustering has its very own 
set of problems than Nagios was never meant to solve. You should rather 
spend your time looking at a real clustering solution like Linux-HA (I 
used this one but I know there's other OSS clustering software around...).

Once you have your cluster set up then it makes sense to monitor the 
services *and* the cluster software using Nagios. For failover services 
I find the easiest way is you use a shared IP (IP that moves from one 
server to the other along with the services - this is very easy to add 
once the cluster is set-up) so you always look for the service where 
it's supposed to be running. If a shared IP isn't an option just monitor 
the service on both servers and use check_cluster to detect across all 
nodes.

I'm not saying that you can't achieve this using Nagios...  It might 
actually work for very simplistic scenarios but even in that case you 
may end up accidentally running the service on both servers if you're 
not very careful (something that a clustering software sill not let 
happen). You have to take into account not only every possible failure 
scenarios but also every possible thing a human could be doing at the 
same time your handlers try to recover the service! If kind of like 
reinventing the wheel, but not even using the right tools :)

-- 
Thomas

--
Return on Information:
Google Enterprise Search pays you back
Get the facts.
http://p.sf.net/sfu/google-dev2dev
___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null


Re: [Nagios-users] Nagios as a Service Resiliency Manager

2009-12-11 Thread Christopher McAtackney
That's an interesting link - but unfortunately I don't think it really
covers the situation where a host goes down or becomes unreachable. It
may be the case that Nagios is not suitable for this purpose, but I
thought I would check on here in case anyone had done anything like
this previously.

Cheers,
Chris

2009/12/10 Marcel mits...@gmail.com:
 Maybe this would help:
 http://onlamp.com/onlamp/2006/05/25/self-healing-networks.html

 On Thu, Dec 10, 2009 at 3:08 PM, Christopher McAtackney crist...@gmail.com
 wrote:

 Hi all,

 I have a need to control an Active / Passive pair of components and
 was wondering if anyone had tackled this problem with Nagios?

 The scenario is as follows;

 Host A has SERVICE_1 installed and running. Host B has SERVICE_2
 installed, but not running.

 The desired functionality is to detect when SERVICE_1 is not running
 (or that Host A is down / unreachable), and then to start SERVICE_2 on
 Host B.

 I believe I can do this with Nagios by defining an event handler on
 SERVICE_1 which will make the appropriate call to start SERVICE_2 on
 Host B

 Would it make sense to store the relationship between SERVICE_1 and
 Host B / SERVICE_2 as a service macro, e.g.
 $_SERVICE_PASSIVE_HOSTNAME, $_SERVICE_PASSIVE_SERVICENAME?

 There are too many scenarios in which the SERVICE_1 might come back up
 to try automate the switching off of SERVICE_2 I believe, e.g. if
 someone pulled a network cable on Host A accidently, then plugged it
 in 15 minutes later - during which time Nagios detects that it is down
 and so starts up SERVICE_2. The user then plugs the network lead back
 in and now we have two Active instances running - which is what we
 specifically wanted to avoid. Even if Nagios detects that the primary
 component is up, it's still too late because any Active / Active
 overlap will cause problems for this particular application.

 I can't think of any way to automate that side of things - but does
 the general concept of having Nagios start up a Passive partner make
 sense?

 Thanks for any insight you have,

 Chris


 --
 Return on Information:
 Google Enterprise Search pays you back
 Get the facts.
 http://p.sf.net/sfu/google-dev2dev
 ___
 Nagios-users mailing list
 Nagios-users@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/nagios-users
 ::: Please include Nagios version, plugin version (-v) and OS when
 reporting any issue.
 ::: Messages without supporting info will risk being sent to /dev/null



--
Return on Information:
Google Enterprise Search pays you back
Get the facts.
http://p.sf.net/sfu/google-dev2dev
___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null


Re: [Nagios-users] Nagios as a Service Resiliency Manager

2009-12-11 Thread gmartin
Chris, great thing about Nagios is it enables creative solution like
this. I'd love to see you try it and report back on how it works for
you.

On 12/11/09, Christopher McAtackney crist...@gmail.com wrote:
 That's an interesting link - but unfortunately I don't think it really
 covers the situation where a host goes down or becomes unreachable. It
 may be the case that Nagios is not suitable for this purpose, but I
 thought I would check on here in case anyone had done anything like
 this previously.

 Cheers,
 Chris

 2009/12/10 Marcel mits...@gmail.com:
 Maybe this would help:
 http://onlamp.com/onlamp/2006/05/25/self-healing-networks.html

 On Thu, Dec 10, 2009 at 3:08 PM, Christopher McAtackney
 crist...@gmail.com
 wrote:

 Hi all,

 I have a need to control an Active / Passive pair of components and
 was wondering if anyone had tackled this problem with Nagios?

 The scenario is as follows;

 Host A has SERVICE_1 installed and running. Host B has SERVICE_2
 installed, but not running.

 The desired functionality is to detect when SERVICE_1 is not running
 (or that Host A is down / unreachable), and then to start SERVICE_2 on
 Host B.

 I believe I can do this with Nagios by defining an event handler on
 SERVICE_1 which will make the appropriate call to start SERVICE_2 on
 Host B

 Would it make sense to store the relationship between SERVICE_1 and
 Host B / SERVICE_2 as a service macro, e.g.
 $_SERVICE_PASSIVE_HOSTNAME, $_SERVICE_PASSIVE_SERVICENAME?

 There are too many scenarios in which the SERVICE_1 might come back up
 to try automate the switching off of SERVICE_2 I believe, e.g. if
 someone pulled a network cable on Host A accidently, then plugged it
 in 15 minutes later - during which time Nagios detects that it is down
 and so starts up SERVICE_2. The user then plugs the network lead back
 in and now we have two Active instances running - which is what we
 specifically wanted to avoid. Even if Nagios detects that the primary
 component is up, it's still too late because any Active / Active
 overlap will cause problems for this particular application.

 I can't think of any way to automate that side of things - but does
 the general concept of having Nagios start up a Passive partner make
 sense?

 Thanks for any insight you have,

 Chris


 --
 Return on Information:
 Google Enterprise Search pays you back
 Get the facts.
 http://p.sf.net/sfu/google-dev2dev
 ___
 Nagios-users mailing list
 Nagios-users@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/nagios-users
 ::: Please include Nagios version, plugin version (-v) and OS when
 reporting any issue.
 ::: Messages without supporting info will risk being sent to /dev/null



 --
 Return on Information:
 Google Enterprise Search pays you back
 Get the facts.
 http://p.sf.net/sfu/google-dev2dev
 ___
 Nagios-users mailing list
 Nagios-users@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/nagios-users
 ::: Please include Nagios version, plugin version (-v) and OS when reporting
 any issue.
 ::: Messages without supporting info will risk being sent to /dev/null


-- 
Sent from my mobile device

\\Greg

--
Return on Information:
Google Enterprise Search pays you back
Get the facts.
http://p.sf.net/sfu/google-dev2dev
___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null


[Nagios-users] Nagios as a Service Resiliency Manager

2009-12-10 Thread Christopher McAtackney
Hi all,

I have a need to control an Active / Passive pair of components and
was wondering if anyone had tackled this problem with Nagios?

The scenario is as follows;

Host A has SERVICE_1 installed and running. Host B has SERVICE_2
installed, but not running.

The desired functionality is to detect when SERVICE_1 is not running
(or that Host A is down / unreachable), and then to start SERVICE_2 on
Host B.

I believe I can do this with Nagios by defining an event handler on
SERVICE_1 which will make the appropriate call to start SERVICE_2 on
Host B

Would it make sense to store the relationship between SERVICE_1 and
Host B / SERVICE_2 as a service macro, e.g.
$_SERVICE_PASSIVE_HOSTNAME, $_SERVICE_PASSIVE_SERVICENAME?

There are too many scenarios in which the SERVICE_1 might come back up
to try automate the switching off of SERVICE_2 I believe, e.g. if
someone pulled a network cable on Host A accidently, then plugged it
in 15 minutes later - during which time Nagios detects that it is down
and so starts up SERVICE_2. The user then plugs the network lead back
in and now we have two Active instances running - which is what we
specifically wanted to avoid. Even if Nagios detects that the primary
component is up, it's still too late because any Active / Active
overlap will cause problems for this particular application.

I can't think of any way to automate that side of things - but does
the general concept of having Nagios start up a Passive partner make
sense?

Thanks for any insight you have,

Chris

--
Return on Information:
Google Enterprise Search pays you back
Get the facts.
http://p.sf.net/sfu/google-dev2dev
___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null


Re: [Nagios-users] Nagios as a Service Resiliency Manager

2009-12-10 Thread Marcel
Maybe this would help:
http://onlamp.com/onlamp/2006/05/25/self-healing-networks.html

On Thu, Dec 10, 2009 at 3:08 PM, Christopher McAtackney
crist...@gmail.comwrote:

 Hi all,

 I have a need to control an Active / Passive pair of components and
 was wondering if anyone had tackled this problem with Nagios?

 The scenario is as follows;

 Host A has SERVICE_1 installed and running. Host B has SERVICE_2
 installed, but not running.

 The desired functionality is to detect when SERVICE_1 is not running
 (or that Host A is down / unreachable), and then to start SERVICE_2 on
 Host B.

 I believe I can do this with Nagios by defining an event handler on
 SERVICE_1 which will make the appropriate call to start SERVICE_2 on
 Host B

 Would it make sense to store the relationship between SERVICE_1 and
 Host B / SERVICE_2 as a service macro, e.g.
 $_SERVICE_PASSIVE_HOSTNAME, $_SERVICE_PASSIVE_SERVICENAME?

 There are too many scenarios in which the SERVICE_1 might come back up
 to try automate the switching off of SERVICE_2 I believe, e.g. if
 someone pulled a network cable on Host A accidently, then plugged it
 in 15 minutes later - during which time Nagios detects that it is down
 and so starts up SERVICE_2. The user then plugs the network lead back
 in and now we have two Active instances running - which is what we
 specifically wanted to avoid. Even if Nagios detects that the primary
 component is up, it's still too late because any Active / Active
 overlap will cause problems for this particular application.

 I can't think of any way to automate that side of things - but does
 the general concept of having Nagios start up a Passive partner make
 sense?

 Thanks for any insight you have,

 Chris


 --
 Return on Information:
 Google Enterprise Search pays you back
 Get the facts.
 http://p.sf.net/sfu/google-dev2dev
 ___
 Nagios-users mailing list
 Nagios-users@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/nagios-users
 ::: Please include Nagios version, plugin version (-v) and OS when
 reporting any issue.
 ::: Messages without supporting info will risk being sent to /dev/null

--
Return on Information:
Google Enterprise Search pays you back
Get the facts.
http://p.sf.net/sfu/google-dev2dev
___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null