On 09/06/2016 02:04 PM, Devin Ortner wrote: > I have a 2-node cluster running CentOS 6.8 and Pacemaker with DRBD. I have > been using the "Clusters from Scratch" documentation to create my cluster and > I am running into a problem where DRBD is not failing over to the other node > when one goes down. Here is my "pcs status" prior to when it is supposed to > fail over:
The most up-to-date version of Clusters From Scratch targets CentOS 7.1, which has corosync 2, pcs, and a recent pacemaker: http://clusterlabs.org/doc/en-US/Pacemaker/1.1-pcs/html-single/Clusters_from_Scratch/index.html There is an older version targeting Fedora 13, which has CMAN, corosync 1, the crm shell, and an older pacemaker: http://clusterlabs.org/doc/en-US/Pacemaker/1.1-plugin/html-single/Clusters_from_Scratch/index.html Your system is in between, with CMAN, corosync 1, pcs, and a newer pacemaker, so you might want to compare the two guides as you go. > ---------------------------------------------------------------------------------------------------------------------- > > [root@node1 ~]# pcs status > Cluster name: webcluster > Last updated: Tue Sep 6 14:50:21 2016 Last change: Tue Sep 6 > 14:50:17 2016 by root via crm_attribute on node1 > Stack: cman > Current DC: node2 (version 1.1.14-8.el6_8.1-70404b0) - partition with quorum > 2 nodes and 5 resources configured > > Online: [ node1 node2 ] > > Full list of resources: > > Cluster_VIP (ocf::heartbeat:IPaddr2): Started node1 > Master/Slave Set: ClusterDBclone [ClusterDB] > Masters: [ node1 ] > Slaves: [ node2 ] > ClusterFS (ocf::heartbeat:Filesystem): Started node1 > WebSite (ocf::heartbeat:apache): Started node1 > > Failed Actions: > * ClusterFS_start_0 on node2 'unknown error' (1): call=61, status=complete, > exitreason='none', > last-rc-change='Tue Sep 6 13:15:00 2016', queued=0ms, exec=40ms 'unknown error' means the Filesystem resource agent returned an error status. Check the system log for messages from the resource agent to see what the error actually was. > > PCSD Status: > node1: Online > node2: Online > > [root@node1 ~]# > > When I put node1 in standby everything fails over except DRBD: > -------------------------------------------------------------------------------------- > > [root@node1 ~]# pcs cluster standby node1 > [root@node1 ~]# pcs status > Cluster name: webcluster > Last updated: Tue Sep 6 14:53:45 2016 Last change: Tue Sep 6 > 14:53:37 2016 by root via cibadmin on node2 > Stack: cman > Current DC: node2 (version 1.1.14-8.el6_8.1-70404b0) - partition with quorum > 2 nodes and 5 resources configured > > Node node1: standby > Online: [ node2 ] > > Full list of resources: > > Cluster_VIP (ocf::heartbeat:IPaddr2): Started node2 > Master/Slave Set: ClusterDBclone [ClusterDB] > Slaves: [ node2 ] > Stopped: [ node1 ] > ClusterFS (ocf::heartbeat:Filesystem): Stopped > WebSite (ocf::heartbeat:apache): Started node2 > > Failed Actions: > * ClusterFS_start_0 on node2 'unknown error' (1): call=61, status=complete, > exitreason='none', > last-rc-change='Tue Sep 6 13:15:00 2016', queued=0ms, exec=40ms > > > PCSD Status: > node1: Online > node2: Online > > [root@node1 ~]# > > I have pasted the contents of "/var/log/messages" here: > http://pastebin.com/0i0FMzGZ > Here is my Configuration: http://pastebin.com/HqqBV90p One thing lacking in Clusters From Scratch is that master/slave resources such as ClusterDB should have two monitor operations, one for the master role and one for the slave role. Something like: op monitor interval=59s role=Master op monitor interval=60s role=Slave Not sure if that will help your issue, but it's a good idea. Another thing the guide should do differently is configure stonith before drbd. Once you have fencing working in pacemaker, take a look at LINBIT's DRBD User Guide for whatever version you installed ( https://www.drbd.org/en/doc ) and look for the Pacemaker chapter. It will describe how to connect the fencing between DRBD and Pacemaker's CIB. Your constraints need a few tweaks: you have two "ClusterFS with ClusterDBclone", one with "with-rsc-role:Master" and one without. You want the one with Master. Your "Cluster_VIP with ClusterDBclone" should also be with Master. When you colocate with a clone without specifying the role, it means the resource can run anywhere any instance of the clone is running (whether slave or master). In this case, you only want the resources to run with the master instance, so you need to specify that. That could be the main source of your issue. > When I unstandby node1, it comes back as the master for the DRBD and > everything else stays running on node2 (Which is fine because I haven't setup > colocation constraints for that) > Here is what I have after node1 is back: > ----------------------------------------------------- > > [root@node1 ~]# pcs cluster unstandby node1 > [root@node1 ~]# pcs status > Cluster name: webcluster > Last updated: Tue Sep 6 14:57:46 2016 Last change: Tue Sep 6 > 14:57:42 2016 by root via cibadmin on node1 > Stack: cman > Current DC: node2 (version 1.1.14-8.el6_8.1-70404b0) - partition with quorum > 2 nodes and 5 resources configured > > Online: [ node1 node2 ] > > Full list of resources: > > Cluster_VIP (ocf::heartbeat:IPaddr2): Started node2 > Master/Slave Set: ClusterDBclone [ClusterDB] > Masters: [ node1 ] > Slaves: [ node2 ] > ClusterFS (ocf::heartbeat:Filesystem): Started node1 > WebSite (ocf::heartbeat:apache): Started node2 > > Failed Actions: > * ClusterFS_start_0 on node2 'unknown error' (1): call=61, status=complete, > exitreason='none', > last-rc-change='Tue Sep 6 13:15:00 2016', queued=0ms, exec=40ms > > > PCSD Status: > node1: Online > node2: Online > > [root@node1 ~]# > > Any help would be appreciated, I think there is something dumb that I'm > missing. > > Thank you. > > _______________________________________________ > Users mailing list: Users@clusterlabs.org > http://clusterlabs.org/mailman/listinfo/users > > Project Home: http://www.clusterlabs.org > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > Bugs: http://bugs.clusterlabs.org > _______________________________________________ Users mailing list: Users@clusterlabs.org http://clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org