[Pacemaker] Split Brain on DRBD Dual Primary

Ho, Alamsyah - ACE Life Indonesia Tue, 11 Nov 2014 22:24:26 -0800

Hi All,

On October archives, I saw the issue reported by Felix Zachlod on 
http://oss.clusterlabs.org/pipermail/pacemaker/2014-October/022653.html and the 
same is actually happens to me now on dual primary DRBD node.


My current OS was RHEL 6.6 and software version that I used was
pacemaker-1.1.12-4.el6.x86_64
corosync-1.4.7-1.el6.x86_64
cman-3.0.12.1-68.el6.x86_64
drbd84-utils-8.9.1-1.el6.elrepo.x86_64
kmod-drbd84-8.4.5-2.el6.elrepo.x86_64
gfs2-utils-3.0.12.1-68.el6.x86_64

First, I will explain my existing resource. I have 3 resource which are drbd, 
dlm for gfs2, and HomeFS.

Master: HomeDataClone
  Meta Attrs: master-max=2 master-node-max=1 clone-max=2 clone-node-max=1 
notify=true interval=0s
  Resource: HomeData (class=ocf provider=linbit type=drbd)
   Attributes: drbd_resource=homedata
   Operations: start interval=0s timeout=240 (HomeData-start-timeout-240)
               promote interval=0s (HomeData-promote-interval-0s)
               demote interval=0s timeout=90 (HomeData-demote-timeout-90)
               stop interval=0s timeout=100 (HomeData-stop-timeout-100)
               monitor interval=60s (HomeData-monitor-interval-60s)
Clone: HomeFS-clone
  Meta Attrs: start-delay=30s target-role=Stopped
  Resource: HomeFS (class=ocf provider=heartbeat type=Filesystem)
   Attributes: device=/dev/drbd/by-res/homedata directory=/home fstype=gfs2
   Operations: start interval=0s timeout=60 (HomeFS-start-timeout-60)
               stop interval=0s timeout=60 (HomeFS-stop-timeout-60)
               monitor interval=20 timeout=40 (HomeFS-monitor-interval-20)
Clone: dlm-clone
  Meta Attrs: clone-max=2 clone-node-max=1 start-delay=0s
  Resource: dlm (class=ocf provider=pacemaker type=controld)
   Operations: start interval=0s timeout=90 (dlm-start-timeout-90)
               stop interval=0s timeout=100 (dlm-stop-timeout-100)
               monitor interval=60s (dlm-monitor-interval-60s)


But when I try to start the cluster on normal condition, It will cause split 
brain on DRBD on each node. From the log I can see it was the same case with 
Felix which was caused by pacemaker promoting drbd to primary  while it was 
still waiting for handshake connection on each node.

Nov 12 11:37:32 node002 kernel: block drbd1: disk( Attaching -> UpToDate )
Nov 12 11:37:32 node002 kernel: block drbd1: attached to UUIDs 
C9630089EC3B58CC:0000000000000000:B4653C665EBC0DBB:B4643C665EBC0DBA
Nov 12 11:37:32 node002 kernel: drbd homedata: conn( StandAlone -> Unconnected )
Nov 12 11:37:32 node002 kernel: drbd homedata: Starting receiver thread (from 
drbd_w_homedata [22531])
Nov 12 11:37:32 node002 kernel: drbd homedata: receiver (re)started
Nov 12 11:37:32 node002 kernel: drbd homedata: conn( Unconnected -> 
WFConnection )
Nov 12 11:37:32 node002 attrd[22340]:   notice: attrd_trigger_update: Sending 
flush op to all hosts for: master-HomeData (1000)
Nov 12 11:37:32 node002 attrd[22340]:   notice: attrd_perform_update: Sent 
update 17: master-HomeData=1000
Nov 12 11:37:32 node002 crmd[22342]:   notice: process_lrm_event: Operation 
HomeData_start_0: ok (node=node002, call=18, rc=0, cib-update=13, 
confirmed=true)
Nov 12 11:37:33 node002 crmd[22342]:   notice: process_lrm_event: Operation 
HomeData_notify_0: ok (node=node002, call=19, rc=0, cib-update=0, 
confirmed=true)
Nov 12 11:37:33 node002 crmd[22342]:   notice: process_lrm_event: Operation 
HomeData_notify_0: ok (node=node002, call=20, rc=0, cib-update=0, 
confirmed=true)
Nov 12 11:37:33 node002 kernel: block drbd1: role( Secondary -> Primary )
Nov 12 11:37:33 node002 kernel: block drbd1: new current UUID 
58F02AE0E03C1C91:C9630089EC3B58CC:B4653C665EBC0DBB:B4643C665EBC0DBA
Nov 12 11:37:33 node002 crmd[22342]:   notice: process_lrm_event: Operation 
HomeData_promote_0: ok (node=node002, call=21, rc=0, cib-update=14, 
confirmed=true)
Nov 12 11:37:33 node002 attrd[22340]:   notice: attrd_trigger_update: Sending 
flush op to all hosts for: master-HomeData (10000)
Nov 12 11:37:33 node002 attrd[22340]:   notice: attrd_perform_update: Sent 
update 23: master-HomeData=10000
Nov 12 11:37:33 node002 crmd[22342]:   notice: process_lrm_event: Operation 
HomeData_notify_0: ok (node=node002, call=22, rc=0, cib-update=0, 
confirmed=true)
Nov 12 11:37:33 node002 kernel: drbd homedata: Handshake successful: Agreed 
network protocol version 101
Nov 12 11:37:33 node002 kernel: drbd homedata: Agreed to support TRIM on 
protocol level
Nov 12 11:37:33 node002 kernel: drbd homedata: Peer authenticated using 20 
bytes HMAC
Nov 12 11:37:33 node002 kernel: drbd homedata: conn( WFConnection -> 
WFReportParams )
Nov 12 11:37:33 node002 kernel: drbd homedata: Starting asender thread (from 
drbd_r_homedata [22543])
Nov 12 11:37:33 node002 kernel: block drbd1: drbd_sync_handshake:
Nov 12 11:37:33 node002 kernel: block drbd1: self 
58F02AE0E03C1C91:C9630089EC3B58CC:B4653C665EBC0DBB:B4643C665EBC0DBA bits:0 
flags:0
Nov 12 11:37:33 node002 kernel: block drbd1: peer 
0FAA8E4B66817421:C9630089EC3B58CD:B4653C665EBC0DBA:B4643C665EBC0DBA bits:0 
flags:0
Nov 12 11:37:33 node002 kernel: block drbd1: uuid_compare()=100 by rule 90
Nov 12 11:37:33 node002 kernel: block drbd1: helper command: /sbin/drbdadm 
initial-split-brain minor-1
Nov 12 11:37:33 node002 kernel: block drbd1: helper command: /sbin/drbdadm 
initial-split-brain minor-1 exit code 0 (0x0)
Nov 12 11:37:33 node002 kernel: block drbd1: Split-Brain detected but 
unresolved, dropping connection!
Nov 12 11:37:33 node002 kernel: block drbd1: helper command: /sbin/drbdadm 
split-brain minor-1
Nov 12 11:37:33 node002 kernel: block drbd1: helper command: /sbin/drbdadm 
split-brain minor-1 exit code 0 (0x0)
Nov 12 11:37:33 node002 kernel: drbd homedata: conn( WFReportParams -> 
Disconnecting )
Nov 12 11:37:33 node002 kernel: drbd homedata: error receiving ReportState, e: 
-5 l: 0!
Nov 12 11:37:33 node002 kernel: drbd homedata: asender terminated
Nov 12 11:37:33 node002 kernel: drbd homedata: Terminating drbd_a_homedata
Nov 12 11:37:33 node002 kernel: drbd homedata: Connection closed
Nov 12 11:37:33 node002 kernel: drbd homedata: conn( Disconnecting -> 
StandAlone )
Nov 12 11:37:33 node002 kernel: drbd homedata: receiver terminated
Nov 12 11:37:33 node002 kernel: drbd homedata: Terminating drbd_r_homedata

But if I disable the other two resource and only have HomeDataClone resource 
enabled on cluster startup, then drbd device starts connected and both nodes 
promoted to primary. Here is the log

Nov 12 12:38:11 node002 kernel: drbd homedata: Starting worker thread (from 
drbdsetup-84 [26752])
Nov 12 12:38:11 node002 kernel: block drbd1: disk( Diskless -> Attaching )
Nov 12 12:38:11 node002 kernel: drbd homedata: Method to ensure write ordering: 
flush
Nov 12 12:38:11 node002 kernel: block drbd1: max BIO size = 1048576
Nov 12 12:38:11 node002 kernel: block drbd1: drbd_bm_resize called with 
capacity == 314563128
Nov 12 12:38:11 node002 kernel: block drbd1: resync bitmap: bits=39320391 
words=614382 pages=1200
Nov 12 12:38:11 node002 kernel: block drbd1: size = 150 GB (157281564 KB)
Nov 12 12:38:11 node002 kernel: block drbd1: recounting of set bits took 
additional 7 jiffies
Nov 12 12:38:11 node002 kernel: block drbd1: 0 KB (0 bits) marked out-of-sync 
by on disk bit-map.
Nov 12 12:38:11 node002 kernel: block drbd1: disk( Attaching -> UpToDate )
Nov 12 12:38:11 node002 kernel: block drbd1: attached to UUIDs 
01FA7FA3D219A8B4:0000000000000000:0FAB8E4B66817420:0FAA8E4B66817421
Nov 12 12:38:11 node002 kernel: drbd homedata: conn( StandAlone -> Unconnected )
Nov 12 12:38:11 node002 kernel: drbd homedata: Starting receiver thread (from 
drbd_w_homedata [26753])
Nov 12 12:38:11 node002 kernel: drbd homedata: receiver (re)started
Nov 12 12:38:11 node002 kernel: drbd homedata: conn( Unconnected -> 
WFConnection )
Nov 12 12:38:11 node002 attrd[26577]:   notice: attrd_trigger_update: Sending 
flush op to all hosts for: master-HomeData (1000)
Nov 12 12:38:11 node002 attrd[26577]:   notice: attrd_perform_update: Sent 
update 17: master-HomeData=1000
Nov 12 12:38:11 node002 crmd[26579]:   notice: process_lrm_event: Operation 
HomeData_start_0: ok (node=node002, call=18, rc=0, cib-update=12, 
confirmed=true)
Nov 12 12:38:11 node002 crmd[26579]:   notice: process_lrm_event: Operation 
HomeData_notify_0: ok (node=node002, call=19, rc=0, cib-update=0, 
confirmed=true)
Nov 12 12:38:11 node002 kernel: drbd homedata: Handshake successful: Agreed 
network protocol version 101
Nov 12 12:38:11 node002 kernel: drbd homedata: Agreed to support TRIM on 
protocol level
Nov 12 12:38:11 node002 kernel: drbd homedata: Peer authenticated using 20 
bytes HMAC
Nov 12 12:38:11 node002 kernel: drbd homedata: conn( WFConnection -> 
WFReportParams )
Nov 12 12:38:11 node002 kernel: drbd homedata: Starting asender thread (from 
drbd_r_homedata [26764])
Nov 12 12:38:11 node002 kernel: block drbd1: drbd_sync_handshake:
Nov 12 12:38:11 node002 kernel: block drbd1: self 
01FA7FA3D219A8B4:0000000000000000:0FAB8E4B66817420:0FAA8E4B66817421 bits:0 
flags:0
Nov 12 12:38:11 node002 kernel: block drbd1: peer 
4499ABF2AAE91DF2:01FA7FA3D219A8B5:0FAB8E4B66817421:0FAA8E4B66817421 bits:0 
flags:0
Nov 12 12:38:11 node002 kernel: block drbd1: uuid_compare()=-1 by rule 50
Nov 12 12:38:11 node002 kernel: block drbd1: peer( Unknown -> Secondary ) conn( 
WFReportParams -> WFBitMapT ) disk( UpToDate -> Outdated ) pdsk( DUnknown -> 
UpToDate )
Nov 12 12:38:11 node002 kernel: block drbd1: receive bitmap stats 
[Bytes(packets)]: plain 0(0), RLE 23(1), total 23; compression: 100.0%
Nov 12 12:38:11 node002 kernel: block drbd1: send bitmap stats 
[Bytes(packets)]: plain 0(0), RLE 23(1), total 23; compression: 100.0%
Nov 12 12:38:11 node002 kernel: block drbd1: conn( WFBitMapT -> WFSyncUUID )
Nov 12 12:38:11 node002 kernel: block drbd1: updated sync uuid 
01FB7FA3D219A8B4:0000000000000000:0FAB8E4B66817420:0FAA8E4B66817421
Nov 12 12:38:11 node002 kernel: block drbd1: helper command: /sbin/drbdadm 
before-resync-target minor-1
Nov 12 12:38:11 node002 kernel: block drbd1: helper command: /sbin/drbdadm 
before-resync-target minor-1 exit code 0 (0x0)
Nov 12 12:38:11 node002 kernel: block drbd1: conn( WFSyncUUID -> SyncTarget ) 
disk( Outdated -> Inconsistent )
Nov 12 12:38:11 node002 kernel: block drbd1: Began resync as SyncTarget (will 
sync 0 KB [0 bits set]).
Nov 12 12:38:11 node002 kernel: block drbd1: Resync done (total 1 sec; paused 0 
sec; 0 K/sec)
Nov 12 12:38:11 node002 kernel: block drbd1: updated UUIDs 
4499ABF2AAE91DF2:0000000000000000:01FB7FA3D219A8B4:01FA7FA3D219A8B5
Nov 12 12:38:11 node002 kernel: block drbd1: conn( SyncTarget -> Connected ) 
disk( Inconsistent -> UpToDate )
Nov 12 12:38:11 node002 kernel: block drbd1: helper command: /sbin/drbdadm 
after-resync-target minor-1
Nov 12 12:38:11 node002 crm-unfence-peer.sh[26804]: invoked for homedata
Nov 12 12:38:11 node002 kernel: block drbd1: helper command: /sbin/drbdadm 
after-resync-target minor-1 exit code 0 (0x0)
Nov 12 12:38:12 node002 crmd[26579]:   notice: process_lrm_event: Operation 
dlm_stop_0: ok (node=node002, call=17, rc=0, cib-update=13, confirmed=true)
Nov 12 12:38:13 node002 crmd[26579]:   notice: process_lrm_event: Operation 
HomeData_notify_0: ok (node=node002, call=20, rc=0, cib-update=0, 
confirmed=true)
Nov 12 12:38:13 node002 kernel: block drbd1: peer( Secondary -> Primary )
Nov 12 12:38:13 node002 kernel: block drbd1: role( Secondary -> Primary )
Nov 12 12:38:13 node002 crmd[26579]:   notice: process_lrm_event: Operation 
HomeData_promote_0: ok (node=node002, call=21, rc=0, cib-update=14, 
confirmed=true)
Nov 12 12:38:13 node002 attrd[26577]:   notice: attrd_trigger_update: Sending 
flush op to all hosts for: master-HomeData (10000)
Nov 12 12:38:13 node002 attrd[26577]:   notice: attrd_perform_update: Sent 
update 23: master-HomeData=10000
Nov 12 12:38:13 node002 crmd[26579]:   notice: process_lrm_event: Operation 
HomeData_notify_0: ok (node=node002, call=22, rc=0, cib-update=0, 
confirmed=true)

So based on the result above, then I tried to add constraint order to start 
HomeDataClone then start dlm-clone while still disabling HomeFS-clone. The 
result is drbd startup both connected  and become primary but it seems 
dlm-clone cannot be started after HomeDataClone and that also caused dlm-clone 
service to be stopped and not running.

Master/Slave Set: HomeDataClone [HomeData]
     Masters: [ node001 node002 ]
Clone Set: HomeFS-clone [HomeFS]
     Stopped: [ node001 node002 ]
Clone Set: dlm-clone [dlm]
     Stopped: [ node001 node002 ]

Failed actions:
    dlm_start_0 on node001 'not configured' (6): call=21, status=complete, 
last-rc-change='Wed Nov 12 12:46:42 2014', queued=0ms, exec=59ms
    dlm_start_0 on node002 'not configured' (6): call=21, status=complete, 
last-rc-change='Wed Nov 12 12:46:40 2014', queued=0ms, exec=69ms


Please help to answer some of my questions

1.       Why pacemaker controld failed if I set the constraint order to start 
drbd resource first then after that start controld?

2.       Is there a configuration to enable drbd promote delay? For example 
after start drbd service then wait before promoting the resource to primary

So right now, I am stuck on getting cluster running and managed all the 
resources without any error at all. My temporary solution is to start the drbd 
service manually first before using pcs to start up the cluster. Of course this 
is not the best practices so I would like to ask any advice or feedback to fix 
this issue.


Thanks before for any hint/advice.

___________________________________________________________________
This email is intended for the designated recipient(s) only, and may be 
confidential, non-public, proprietary, protected by the attorney/client or 
other privilege. Unauthorized reading, distribution, copying or other use of 
this communication is prohibited and may be unlawful. Receipt by anyone other 
than the intended recipient(s) should not be deemed a waiver of any privilege 
or protection. If you are not the intended recipient or if you believe that you 
have received this email in error, please notify the sender immediately and 
delete all copies from your computer system without reading, saving, or using 
it in any manner. Although it has been checked for viruses and other malicious 
software ("malware"), we do not warrant, represent or guarantee in any way that 
this communication is free of malware or potentially damaging defects. All 
liability for any actual or alleged loss, damage, or injury arising out of or 
resulting in any way from the receipt, opening or use of this email is 
expressly disclaimed.
______________________________________________________________________

_______________________________________________
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

[Pacemaker] Split Brain on DRBD Dual Primary

Reply via email to