Hello everybody,

I need to be able to bring down my network interface (network failure
test) and few seconds later bring it up again. Without my drbd cluster
going nuts and creating split brains.

I was advised to use ocf:pacemaker:ping, so I started to integrate this
in my configuration: http://pastebin.com/raw.php?i=iyp3URkP

Now the problem is that it kind of works, but not the way I need it to be.

The ping status is not rechecked right _before_ it tries to promoted the
drbd resources. If should do a fast ping check and continue if
successful but _don’t_ promote any drbd resources when it stalls or fails.

The problem is that the ping have been returning good values back until
the network failure and when the failure accrues it is still thinking
the ping status is good and promotes the disk until and few seconds
later the ping status changes to indicate the network failure, but then
all damage is already made...

I must be doing something _terrible_ wrong since I can't believe a
pacemaker/corosync cluster shouldn't be able to survive a network glitch
(short network failures) without all kind of split brains and losing the
node.

Thanks in advance,

Kind regards,

Jelle de Jong

Attachment: signature.asc
Description: OpenPGP digital signature

_______________________________________________
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker

Reply via email to