you need to configure cluster fencing and drbd fencing handler, in this way, the cluster can recevory without manual intervention.
2017-07-12 11:33 GMT+02:00 ArekW <[email protected]>: > Hi, > Can in be fixed that the drbd is entering split brain after cluster > node recovery? After few tests I saw drbd recovered but in most > situations (9/10) it didn't sync. > > 1. When a node is put to standby and than unstandby everything is > working OK. The drbd is syncing and go to primary mode. > > 2. When a node is (hard)poweroff, the stonith brings it up and > eventually the node becomes online but the drdb is in StandAlone state > on the recovered node. I can sync it only manually but that require to > stop the cluster. > > Logs: > Jul 12 10:26:35 nfsnode1 kernel: drbd storage nfsnode2: Handshake to > peer 1 successful: Agreed network protocol version 112 > Jul 12 10:26:35 nfsnode1 kernel: drbd storage nfsnode2: Feature flags > enabled on protocol level: 0x7 TRIM THIN_RESYNC WRITE_SAME. > Jul 12 10:26:35 nfsnode1 kernel: drbd storage nfsnode2: Starting > ack_recv thread (from drbd_r_storage [28960]) > Jul 12 10:26:35 nfsnode1 kernel: drbd storage: Preparing cluster-wide > state change 2237079084 (0->1 499/145) > Jul 12 10:26:35 nfsnode1 kernel: drbd storage: State change > 2237079084: primary_nodes=1, weak_nodes=FFFFFFFFFFFFFFFC > Jul 12 10:26:35 nfsnode1 kernel: drbd storage: Committing cluster-wide > state change 2237079084 (1ms) > Jul 12 10:26:35 nfsnode1 kernel: drbd storage nfsnode2: conn( > Connecting -> Connected ) peer( Unknown -> Secondary ) > Jul 12 10:26:35 nfsnode1 kernel: drbd storage/0 drbd1: current_size: > 14679544 > Jul 12 10:26:35 nfsnode1 kernel: drbd storage/0 drbd1 nfsnode2: > c_size: 14679544 u_size: 0 d_size: 14679544 max_size: 14679544 > Jul 12 10:26:35 nfsnode1 kernel: drbd storage/0 drbd1 nfsnode2: > la_size: 14679544 my_usize: 0 my_max_size: 14679544 > Jul 12 10:26:35 nfsnode1 kernel: drbd storage/0 drbd1: my node_id: 0 > Jul 12 10:26:35 nfsnode1 kernel: drbd storage/0 drbd1 nfsnode2: > node_id: 1 idx: 0 bm-uuid: 0x441536064ceddc92 flags: 0x10 max_size: > 14679544 (DUnknown) > Jul 12 10:26:35 nfsnode1 kernel: drbd storage/0 drbd1 nfsnode2: > calling drbd_determine_dev_size() > Jul 12 10:26:35 nfsnode1 kernel: drbd storage/0 drbd1: my node_id: 0 > Jul 12 10:26:35 nfsnode1 kernel: drbd storage/0 drbd1 nfsnode2: > node_id: 1 idx: 0 bm-uuid: 0x441536064ceddc92 flags: 0x10 max_size: > 14679544 (DUnknown) > Jul 12 10:26:35 nfsnode1 kernel: drbd storage/0 drbd1 nfsnode2: > drbd_sync_handshake: > Jul 12 10:26:35 nfsnode1 kernel: drbd storage/0 drbd1 nfsnode2: self > 342BE98297943C35:441536064CEDDC92:69D98E1FCC2BB44C:E04101C6FF76D1CC > bits:15450 flags:120 > Jul 12 10:26:35 nfsnode1 kernel: drbd storage/0 drbd1 nfsnode2: peer > A8908796A7CCFF6E:CE6B672F4EDA6E78:69D98E1FCC2BB44C:E04101C6FF76D1CC > bits:32768 flags:2 > Jul 12 10:26:35 nfsnode1 kernel: drbd storage/0 drbd1 nfsnode2: > uuid_compare()=-100 by rule 100 > Jul 12 10:26:35 nfsnode1 kernel: drbd storage/0 drbd1 nfsnode2: helper > command: /sbin/drbdadm initial-split-brain > Jul 12 10:26:35 nfsnode1 kernel: drbd storage/0 drbd1 nfsnode2: helper > command: /sbin/drbdadm initial-split-brain exit code 0 (0x0) > Jul 12 10:26:35 nfsnode1 kernel: drbd storage/0 drbd1: Split-Brain > detected but unresolved, dropping connection! > Jul 12 10:26:35 nfsnode1 kernel: drbd storage/0 drbd1 nfsnode2: helper > command: /sbin/drbdadm split-brain > Jul 12 10:26:35 nfsnode1 kernel: drbd storage/0 drbd1 nfsnode2: helper > command: /sbin/drbdadm split-brain exit code 0 (0x0) > Jul 12 10:26:35 nfsnode1 kernel: drbd storage nfsnode2: conn( > Connected -> Disconnecting ) peer( Secondary -> Unknown ) > Jul 12 10:26:35 nfsnode1 kernel: drbd storage nfsnode2: error > receiving P_STATE, e: -5 l: 0! > Jul 12 10:26:35 nfsnode1 kernel: drbd storage nfsnode2: ack_receiver > terminated > Jul 12 10:26:35 nfsnode1 kernel: drbd storage nfsnode2: Terminating > ack_recv thread > Jul 12 10:26:35 nfsnode1 kernel: drbd storage nfsnode2: Connection closed > Jul 12 10:26:35 nfsnode1 kernel: drbd storage nfsnode2: conn( > Disconnecting -> StandAlone ) > Jul 12 10:26:35 nfsnode1 kernel: drbd storage nfsnode2: Terminating > receiver thread > > > Config: > resource storage { > protocol C; > meta-disk internal; > device /dev/drbd1; > syncer { > verify-alg sha1; > } > net { > allow-two-primaries; > } > on nfsnode1 { > disk /dev/storage/drbd; > address 10.0.2.15:7789; > } > on nfsnode2 { > disk /dev/storage/drbd; > address 10.0.2.4:7789; > } > } > > pcs resource show StorageFS-clone > Clone: StorageFS-clone > Resource: StorageFS (class=ocf provider=heartbeat type=Filesystem) > Attributes: device=/dev/drbd1 directory=/mnt/drbd fstype=gfs2 > Operations: start interval=0s timeout=60 (StorageFS-start-interval-0s) > stop interval=0s timeout=60 (StorageFS-stop-interval-0s) > monitor interval=20 timeout=40 (StorageFS-monitor-interval- > 20) > > Full list of resources: > > Master/Slave Set: StorageClone [Storage] > Masters: [ nfsnode1 nfsnode2 ] > Clone Set: dlm-clone [dlm] > Started: [ nfsnode1 nfsnode2 ] > Clone Set: ClusterIP-clone [ClusterIP] (unique) > ClusterIP:0 (ocf::heartbeat:IPaddr2): Started nfsnode2 > ClusterIP:1 (ocf::heartbeat:IPaddr2): Started nfsnode1 > Clone Set: StorageFS-clone [StorageFS] > Started: [ nfsnode1 nfsnode2 ] > Clone Set: WebSite-clone [WebSite] > Started: [ nfsnode1 nfsnode2 ] > Clone Set: nfs-group-clone [nfs-group] > Started: [ nfsnode1 nfsnode2 ] > Clone Set: ping-clone [ping] > Started: [ nfsnode1 nfsnode2 ] > vbox-fencing (stonith:fence_vbox): Started nfsnode2 > > _______________________________________________ > Users mailing list: [email protected] > http://lists.clusterlabs.org/mailman/listinfo/users > > Project Home: http://www.clusterlabs.org > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > Bugs: http://bugs.clusterlabs.org > -- .~. /V\ // \\ /( )\ ^`~'^
_______________________________________________ Users mailing list: [email protected] http://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
