On 07/27/2012 11:48 AM, Cal Heldenbrand wrote:
Why wouldn't my mem3 failover happen if it timed out stopping the cluster IP?
If a stop action fails, pacemaker can't know if the resource is running, not running, or in some other broken state. The cluster is in an unknown state, and there's no reasonable thing pacemaker can do. Since pacemaker thinks a node is broken (it failed to stop a resource, as requested) but isn't sure, the solution is to transition to a known state by powering the node off, resetting it, or otherwise fencing it. Configure a STONITH resource to do this. Without STONITH, your only option is to manually address the cause of the failure (high load, in this case), then issue "crm resource cleanup ..." on any failed resources to instruct pacemaker that it is safe to try again.
_______________________________________________ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org