I have a 2-node cluster running CentOS 6.8 and Pacemaker with DRBD. I have been
using the "Clusters from Scratch" documentation to create my cluster and I am
running into a problem where DRBD is not failing over to the other node when
one goes down. Here is my "pcs status" prior to when it is supposed to fail
over:
----------------------------------------------------------------------------------------------------------------------
[root@node1 ~]# pcs status
Cluster name: webcluster
Last updated: Tue Sep 6 14:50:21 2016 Last change: Tue Sep 6
14:50:17 2016 by root via crm_attribute on node1
Stack: cman
Current DC: node2 (version 1.1.14-8.el6_8.1-70404b0) - partition with quorum
2 nodes and 5 resources configured
Online: [ node1 node2 ]
Full list of resources:
Cluster_VIP (ocf::heartbeat:IPaddr2): Started node1
Master/Slave Set: ClusterDBclone [ClusterDB]
Masters: [ node1 ]
Slaves: [ node2 ]
ClusterFS (ocf::heartbeat:Filesystem): Started node1
WebSite (ocf::heartbeat:apache): Started node1
Failed Actions:
* ClusterFS_start_0 on node2 'unknown error' (1): call=61, status=complete,
exitreason='none',
last-rc-change='Tue Sep 6 13:15:00 2016', queued=0ms, exec=40ms
PCSD Status:
node1: Online
node2: Online
[root@node1 ~]#
When I put node1 in standby everything fails over except DRBD:
--------------------------------------------------------------------------------------
[root@node1 ~]# pcs cluster standby node1
[root@node1 ~]# pcs status
Cluster name: webcluster
Last updated: Tue Sep 6 14:53:45 2016 Last change: Tue Sep 6
14:53:37 2016 by root via cibadmin on node2
Stack: cman
Current DC: node2 (version 1.1.14-8.el6_8.1-70404b0) - partition with quorum
2 nodes and 5 resources configured
Node node1: standby
Online: [ node2 ]
Full list of resources:
Cluster_VIP (ocf::heartbeat:IPaddr2): Started node2
Master/Slave Set: ClusterDBclone [ClusterDB]
Slaves: [ node2 ]
Stopped: [ node1 ]
ClusterFS (ocf::heartbeat:Filesystem): Stopped
WebSite (ocf::heartbeat:apache): Started node2
Failed Actions:
* ClusterFS_start_0 on node2 'unknown error' (1): call=61, status=complete,
exitreason='none',
last-rc-change='Tue Sep 6 13:15:00 2016', queued=0ms, exec=40ms
PCSD Status:
node1: Online
node2: Online
[root@node1 ~]#
I have pasted the contents of "/var/log/messages" here:
http://pastebin.com/0i0FMzGZ
Here is my Configuration: http://pastebin.com/HqqBV90p
When I unstandby node1, it comes back as the master for the DRBD and everything
else stays running on node2 (Which is fine because I haven't setup colocation
constraints for that)
Here is what I have after node1 is back:
-----------------------------------------------------
[root@node1 ~]# pcs cluster unstandby node1
[root@node1 ~]# pcs status
Cluster name: webcluster
Last updated: Tue Sep 6 14:57:46 2016 Last change: Tue Sep 6
14:57:42 2016 by root via cibadmin on node1
Stack: cman
Current DC: node2 (version 1.1.14-8.el6_8.1-70404b0) - partition with quorum
2 nodes and 5 resources configured
Online: [ node1 node2 ]
Full list of resources:
Cluster_VIP (ocf::heartbeat:IPaddr2): Started node2
Master/Slave Set: ClusterDBclone [ClusterDB]
Masters: [ node1 ]
Slaves: [ node2 ]
ClusterFS (ocf::heartbeat:Filesystem): Started node1
WebSite (ocf::heartbeat:apache): Started node2
Failed Actions:
* ClusterFS_start_0 on node2 'unknown error' (1): call=61, status=complete,
exitreason='none',
last-rc-change='Tue Sep 6 13:15:00 2016', queued=0ms, exec=40ms
PCSD Status:
node1: Online
node2: Online
[root@node1 ~]#
Any help would be appreciated, I think there is something dumb that I'm missing.
Thank you.
_______________________________________________
Users mailing list: [email protected]
http://clusterlabs.org/mailman/listinfo/users
Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org