Re: [Pacemaker] Behavior of Corosync+Pacemaker with DRBD primary power loss

Andrew Martin Tue, 23 Oct 2012 08:10:21 -0700

Hello, 

Under the Clusters from Scratch documentation, allow-two-primaries is set in 
the DRBD configuration for an active/passive cluster: 
http://www.clusterlabs.org/doc/en-US/Pacemaker/1.1-crmsh/html-single/Clusters_from_Scratch/index.html#_write_the_drbd_config


"TODO: Explain the reason for the allow-two-primaries option" 

Is the reason for allow-two-primaries in this active/passive cluster (using 
ext4, a non-cluster filesystem) to allow for failover in the type of situation 
I have described (where the old primary/master is suddenly offline like with a 
power supply failure)? Are split-brains prevented because Pacemaker ensures 
that only one node is promoted to Primary at any time? 

Is it possible to recover from such a failure without allow-two-primaries? 

Thanks, 

Andrew 

----- Original Message -----

From: "Andrew Martin" <amar...@xes-inc.com> 
To: "The Pacemaker cluster resource manager" <pacemaker@oss.clusterlabs.org> 
Sent: Friday, October 19, 2012 10:45:04 AM 
Subject: [Pacemaker] Behavior of Corosync+Pacemaker with DRBD primary power 
loss 


Hello, 

I have a 3 node Pacemaker + Corosync cluster with 2 "real" nodes, node0 and 
node1, running a DRBD resource (single-primary) and the 3rd node in standby 
acting as a quorum node. If node0 were running the DRBD resource, and thus is 
DRBD primary, and its power supply fails, will the DRBD resource be promoted to 
primary on node1? 

If I simply cut the DRBD replication link, node1 reports the following state: 
Role: 
Secondary/Unknown 

Disk State: 
UpToDate/DUnknown 

Connection State: 
WFConnection 


I cannot manually promote the DRBD resource because the peer is not outdated: 
0: State change failed: (-7) Refusing to be Primary while peer is not outdated 
Command 'drbdsetup 0 primary' terminated with exit code 11 

I have configured the CIB-based crm-fence-peer.sh utility in my drbd.conf 
fence-peer "/usr/lib/drbd/crm-fence-peer.sh"; 
but I do not believe it would be applicable in this scenario. 

If node0 goes offline like this and doesn't come back (e.g. after a STONITH), 
does Pacemaker have a way to tell node1 that its peer is outdated and to 
proceed with promoting the resource to primary? 

Thanks, 

Andrew 

_______________________________________________ 
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org 
http://oss.clusterlabs.org/mailman/listinfo/pacemaker 

Project Home: http://www.clusterlabs.org 
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf 
Bugs: http://bugs.clusterlabs.org

_______________________________________________
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Re: [Pacemaker] Behavior of Corosync+Pacemaker with DRBD primary power loss

Reply via email to