Re: [Linux-HA] Forkbomb not initiating failover

2011-07-07 Thread Florian Haas
On 2011-07-07 13:52, James Smith wrote:
> Hi,
> 
> I appreciate that, but it doesn't answer the question.

Then maybe I misunderstood the question. I had interpreted it to mean
"why doesn't my cluster automatically fail over under high load?" --
perhaps you can rephrase to clarify.

> What I'm getting at, is there are multiple scenarios where a system 
> can fail but in my test scenario I was forcing high load.  My application 
> wouldn't, in a working scenario, ever cause this type of load unless there 
> was a very serious issue that would warrant failover.

Er, how can you be so sure? How about if you just had a ton of users (or
client services) hammering your application. Then that would cause high
load, but it would clearly _not_ warrant failover -- since after you
would fail over, the other node would be hammered just as much.

> So in this scenario I 
> want pacemaker to be able to handle this accordingly without the 
> need to configure additional services entirely separate to the working of 
> pacemaker.

Now please define how exactly Pacemaker would be handling this
"accordingly."

> For example, it's easy to assume the monitor operations on the RA's can 
> handle this already.  The slave should be initiating a monitor operation 
> against 
> the master to see if it's services are still responding.

I'm afraid you're missing the fact that in Pacemaker "a slave" does not
initiate a monitor operation "against the master", what makes you think
that it does? Monitor operations are always run locally. It is only very
few resource agents that are configurable as master/slave sets. _Some_
of those can be configured to have a slave contact a master during
monitoring (like ocf:heartbeat:mysql), some never do (like ocf:linbit:drbd).

> But it seems only the 
> master does this,

No. All nodes do.

> but of course the master is foobared so never responds, 
> so failover never occurs.  Surely I'm not the only one that sees this as 
> rather 
> flawed?

So what would your preferred behavior be? Pacemaker failing over in case
load is high? That's a possibility and could be done via the system
health feature and an appropriate resource agent, but even if that
happens, you stand a pretty good chance -- even though I realize you
don't believe this -- that it is your application that causes this high
load, and then failover makes matters worse, not better.

Cheers,
Florian



signature.asc
Description: OpenPGP digital signature
___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Re: [Linux-HA] Forkbomb not initiating failover

2011-07-07 Thread James Smith
Hi,

I appreciate that, but it doesn't answer the question.

What I'm getting at, is there are multiple scenarios where a system 
can fail but in my test scenario I was forcing high load.  My application 
wouldn't, in a working scenario, ever cause this type of load unless there 
was a very serious issue that would warrant failover.  So in this scenario I 
want pacemaker to be able to handle this accordingly without the 
need to configure additional services entirely separate to the working of 
pacemaker.

For example, it's easy to assume the monitor operations on the RA's can 
handle this already.  The slave should be initiating a monitor operation 
against 
the master to see if it's services are still responding.  But it seems only the 
master does this, but of course the master is foobared so never responds, 
so failover never occurs.  Surely I'm not the only one that sees this as rather 
flawed?

Regards,
James

-Original Message-
From: linux-ha-boun...@lists.linux-ha.org 
[mailto:linux-ha-boun...@lists.linux-ha.org] On Behalf Of Florian Haas
Sent: 07 July 2011 11:59
To: General Linux-HA mailing list
Subject: Re: [Linux-HA] Forkbomb not initiating failover

On 2011-07-07 11:59, James Smith wrote:
> Hi,
> 
> Summary: Two node cluster running DRBD, IET with a floating IP and stonith 
> enabled.
> 
> All this works well, I can kernel panic the machine, kill individual 
> PIDs (for example IET) which then invoke failover.  However, when I forkbomb 
> the master, nothing happens.
> The box is dead, the services stop responding etc, but pacemaker does 
> not recognise this and therefore failover does not occur.
> 
> Very occasionally it will fence and invoke failover after several 
> minutes or even longer, which is no good at all.
> 
> To me, it seems extremely odd pacemaker itself does not automatically 
> incorporate system health checks that can detect such a scenario.  
> I've raised this a couple of times, but the suggestion is to run 
> watchdog or create an RA to do resource checking.  Watchdog certainly does 
> its job and is easy to configure, but this seems flawed to me.

Please refer to:

http://www.gossamer-threads.com/lists/linuxha/pacemaker/70081#70081

Cheers,
Florian

___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems


Re: [Linux-HA] Forkbomb not initiating failover

2011-07-07 Thread Florian Haas
On 2011-07-07 11:59, James Smith wrote:
> Hi,
> 
> Summary: Two node cluster running DRBD, IET with a floating IP and stonith 
> enabled.
> 
> All this works well, I can kernel panic the machine, kill individual PIDs 
> (for example IET)
> which then invoke failover.  However, when I forkbomb the master, nothing 
> happens.
> The box is dead, the services stop responding etc, but pacemaker does not 
> recognise
> this and therefore failover does not occur.
> 
> Very occasionally it will fence and invoke failover after several minutes or 
> even longer,
> which is no good at all.
> 
> To me, it seems extremely odd pacemaker itself does not automatically 
> incorporate system
> health checks that can detect such a scenario.  I've raised this a couple of 
> times, but the
> suggestion is to run watchdog or create an RA to do resource checking.  
> Watchdog certainly
> does its job and is easy to configure, but this seems flawed to me.

Please refer to:

http://www.gossamer-threads.com/lists/linuxha/pacemaker/70081#70081

Cheers,
Florian



signature.asc
Description: OpenPGP digital signature
___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

[Linux-HA] Forkbomb not initiating failover

2011-07-07 Thread James Smith
Hi,

Summary: Two node cluster running DRBD, IET with a floating IP and stonith 
enabled.

All this works well, I can kernel panic the machine, kill individual PIDs (for 
example IET)
which then invoke failover.  However, when I forkbomb the master, nothing 
happens.
The box is dead, the services stop responding etc, but pacemaker does not 
recognise
this and therefore failover does not occur.

Very occasionally it will fence and invoke failover after several minutes or 
even longer,
which is no good at all.

To me, it seems extremely odd pacemaker itself does not automatically 
incorporate system
health checks that can detect such a scenario.  I've raised this a couple of 
times, but the
suggestion is to run watchdog or create an RA to do resource checking.  
Watchdog certainly
does its job and is easy to configure, but this seems flawed to me.

Regards,
James
___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems