On 24/04/2013, at 5:34 AM, Brian J. Murrell <br...@interlinx.bc.ca> wrote:

> Using pacemaker 1.1.8 on RHEL 6.4, I did a test where I just killed
> (-KILL) corosync on a peer node.  Pacemaker seemed to take a long time
> to transition to stonithing it though after noticing it was AWOL:

[snip]

> As you can see, 3 minutes and 10 seconds went by before pacemaker
> transitioned from noticing the node unresponsive to stonithing it.
> 
> This smacks of some kind of mis-configured timeout but I'm not aware
> of any timeout that would have this effect.
> 
> Thoughts?
> b.

Almost certainly you are hitting:

    https://bugzilla.redhat.com/show_bug.cgi?id=951340

I am doing my best to convince people that make decisions that this is worthy 
of an update before 6.5.
The mystery at the moment is why some clusters (ie. all the ones we tested on 
internally) seem unaffected.
_______________________________________________
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Reply via email to