Re: [Linux-HA] Forkbomb not initiating failover
On 2011-07-07 13:52, James Smith wrote: > Hi, > > I appreciate that, but it doesn't answer the question. Then maybe I misunderstood the question. I had interpreted it to mean "why doesn't my cluster automatically fail over under high load?" -- perhaps you can rephrase to clarify. > What I'm getting at, is there are multiple scenarios where a system > can fail but in my test scenario I was forcing high load. My application > wouldn't, in a working scenario, ever cause this type of load unless there > was a very serious issue that would warrant failover. Er, how can you be so sure? How about if you just had a ton of users (or client services) hammering your application. Then that would cause high load, but it would clearly _not_ warrant failover -- since after you would fail over, the other node would be hammered just as much. > So in this scenario I > want pacemaker to be able to handle this accordingly without the > need to configure additional services entirely separate to the working of > pacemaker. Now please define how exactly Pacemaker would be handling this "accordingly." > For example, it's easy to assume the monitor operations on the RA's can > handle this already. The slave should be initiating a monitor operation > against > the master to see if it's services are still responding. I'm afraid you're missing the fact that in Pacemaker "a slave" does not initiate a monitor operation "against the master", what makes you think that it does? Monitor operations are always run locally. It is only very few resource agents that are configurable as master/slave sets. _Some_ of those can be configured to have a slave contact a master during monitoring (like ocf:heartbeat:mysql), some never do (like ocf:linbit:drbd). > But it seems only the > master does this, No. All nodes do. > but of course the master is foobared so never responds, > so failover never occurs. Surely I'm not the only one that sees this as > rather > flawed? So what would your preferred behavior be? Pacemaker failing over in case load is high? That's a possibility and could be done via the system health feature and an appropriate resource agent, but even if that happens, you stand a pretty good chance -- even though I realize you don't believe this -- that it is your application that causes this high load, and then failover makes matters worse, not better. Cheers, Florian signature.asc Description: OpenPGP digital signature ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
Re: [Linux-HA] Forkbomb not initiating failover
Hi, I appreciate that, but it doesn't answer the question. What I'm getting at, is there are multiple scenarios where a system can fail but in my test scenario I was forcing high load. My application wouldn't, in a working scenario, ever cause this type of load unless there was a very serious issue that would warrant failover. So in this scenario I want pacemaker to be able to handle this accordingly without the need to configure additional services entirely separate to the working of pacemaker. For example, it's easy to assume the monitor operations on the RA's can handle this already. The slave should be initiating a monitor operation against the master to see if it's services are still responding. But it seems only the master does this, but of course the master is foobared so never responds, so failover never occurs. Surely I'm not the only one that sees this as rather flawed? Regards, James -Original Message- From: linux-ha-boun...@lists.linux-ha.org [mailto:linux-ha-boun...@lists.linux-ha.org] On Behalf Of Florian Haas Sent: 07 July 2011 11:59 To: General Linux-HA mailing list Subject: Re: [Linux-HA] Forkbomb not initiating failover On 2011-07-07 11:59, James Smith wrote: > Hi, > > Summary: Two node cluster running DRBD, IET with a floating IP and stonith > enabled. > > All this works well, I can kernel panic the machine, kill individual > PIDs (for example IET) which then invoke failover. However, when I forkbomb > the master, nothing happens. > The box is dead, the services stop responding etc, but pacemaker does > not recognise this and therefore failover does not occur. > > Very occasionally it will fence and invoke failover after several > minutes or even longer, which is no good at all. > > To me, it seems extremely odd pacemaker itself does not automatically > incorporate system health checks that can detect such a scenario. > I've raised this a couple of times, but the suggestion is to run > watchdog or create an RA to do resource checking. Watchdog certainly does > its job and is easy to configure, but this seems flawed to me. Please refer to: http://www.gossamer-threads.com/lists/linuxha/pacemaker/70081#70081 Cheers, Florian ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
Re: [Linux-HA] Forkbomb not initiating failover
On 2011-07-07 11:59, James Smith wrote: > Hi, > > Summary: Two node cluster running DRBD, IET with a floating IP and stonith > enabled. > > All this works well, I can kernel panic the machine, kill individual PIDs > (for example IET) > which then invoke failover. However, when I forkbomb the master, nothing > happens. > The box is dead, the services stop responding etc, but pacemaker does not > recognise > this and therefore failover does not occur. > > Very occasionally it will fence and invoke failover after several minutes or > even longer, > which is no good at all. > > To me, it seems extremely odd pacemaker itself does not automatically > incorporate system > health checks that can detect such a scenario. I've raised this a couple of > times, but the > suggestion is to run watchdog or create an RA to do resource checking. > Watchdog certainly > does its job and is easy to configure, but this seems flawed to me. Please refer to: http://www.gossamer-threads.com/lists/linuxha/pacemaker/70081#70081 Cheers, Florian signature.asc Description: OpenPGP digital signature ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
[Linux-HA] Forkbomb not initiating failover
Hi, Summary: Two node cluster running DRBD, IET with a floating IP and stonith enabled. All this works well, I can kernel panic the machine, kill individual PIDs (for example IET) which then invoke failover. However, when I forkbomb the master, nothing happens. The box is dead, the services stop responding etc, but pacemaker does not recognise this and therefore failover does not occur. Very occasionally it will fence and invoke failover after several minutes or even longer, which is no good at all. To me, it seems extremely odd pacemaker itself does not automatically incorporate system health checks that can detect such a scenario. I've raised this a couple of times, but the suggestion is to run watchdog or create an RA to do resource checking. Watchdog certainly does its job and is easy to configure, but this seems flawed to me. Regards, James ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems