Hello everybody, I need to be able to bring down my network interface (network failure test) and few seconds later bring it up again. Without my drbd cluster going nuts and creating split brains.
I was advised to use ocf:pacemaker:ping, so I started to integrate this in my configuration: http://pastebin.com/raw.php?i=iyp3URkP Now the problem is that it kind of works, but not the way I need it to be. The ping status is not rechecked right _before_ it tries to promoted the drbd resources. If should do a fast ping check and continue if successful but _don’t_ promote any drbd resources when it stalls or fails. The problem is that the ping have been returning good values back until the network failure and when the failure accrues it is still thinking the ping status is good and promotes the disk until and few seconds later the ping status changes to indicate the network failure, but then all damage is already made... I must be doing something _terrible_ wrong since I can't believe a pacemaker/corosync cluster shouldn't be able to survive a network glitch (short network failures) without all kind of split brains and losing the node. Thanks in advance, Kind regards, Jelle de Jong
signature.asc
Description: OpenPGP digital signature
_______________________________________________ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker