Re: [ClusterLabs] Help needed getting DRBD cluster working
On Tue, Oct 06, 2015 at 10:13:00AM -0500, Ken Gaillot wrote: > > ms ms_drbd0 drbd_disc0 \ > > meta master-max="1" master-node-max="1" clone-max="2" > > clone-node-max="1" notify="true" target-role="Started" > > You want to omit target-role, or set it to "Master". Otherwise both > nodes will start as slaves. That is incorrect. "Started" != "Slave" target-role "Started" actually means "default for the resource being handled" (the same as if you just removed that target-role attribute), which in this case means "start up to clone-max instances, then of those promote up to master-max instances" target-role Slave would in fact prohibit promotion. and target-role Master would, back in the day, trigger a pacemaker bug where it would try to fulfill target-role, and happend to ignore master-max, trying to promote all instances everywhere ;-) not set: default behaviour started: same as not set slave: do not promote master: nowadays for ms resources same as "Started" or not set, but used to trigger some nasty "promote everywhere" bug (a few years back) -- : Lars Ellenberg : http://www.LINBIT.com | Your Way to High Availability : DRBD, Linux-HA and Pacemaker support and consulting DRBD® and LINBIT® are registered trademarks of LINBIT, Austria. ___ Users mailing list: Users@clusterlabs.org http://clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [ClusterLabs] Help needed getting DRBD cluster working
On 10/06/2015 09:38 AM, Gordon Ross wrote: > On 5 Oct 2015, at 15:05, Ken Gaillot wrote: >> >> The "rc=6" in the failed actions means the resource's Pacemaker >> configuration is invalid. (For OCF return codes, see >> http://clusterlabs.org/doc/en-US/Pacemaker/1.1-pcs/html-single/Pacemaker_Explained/index.html#s-ocf-return-codes >> ) >> >> The "_monitor_0" means that this was the initial probe that Pacemaker >> does before trying to start the resource, to make sure it's not already >> running. As an aside, you probably want to add recurring monitors as >> well, otherwise Pacemaker won't notice if the resource fails. For >> example: op monitor interval="29s" role="Master" op monitor >> interval="31s" role="Slave" >> >> As to why the probe is failing, it's hard to tell. Double-check your >> configuration to make sure disc0 is the exact DRBD name, Pacemaker can >> read the DRBD configuration file, etc. You can also try running the DRBD >> resource agent's "status" command manually to see if it prints a more >> detailed error message. > > I cleated the CIB and re-created most of it with your suggested parameters. > It now looks like: > > node $id="739377522" ct1 > node $id="739377523" ct2 > node $id="739377524" ct3 \ > attributes standby="on" > primitive drbd_disc0 ocf:linbit:drbd \ > params drbd_resource="disc0" \ > meta target-role="Started" \ > op monitor interval="19s" on-fail="restart" role="Master" > start-delay="10s" timeout="20s" \ > op monitor interval="20s" on-fail="restart" role="Slave" > start-delay="10s" timeout="20s" > ms ms_drbd0 drbd_disc0 \ > meta master-max="1" master-node-max="1" clone-max="2" > clone-node-max="1" notify="true" target-role="Started" You want to omit target-role, or set it to "Master". Otherwise both nodes will start as slaves. > location cli-prefer-drbd_disc0 ms_drbd0 inf: ct2 > location cli-prefer-ms_drbd0 ms_drbd0 inf: ct2 You've given the above constraints different names, but they are identical: they both say ms_drbd0 can run on ct2 only. When you're using clone/ms resources, you generally only ever need to refer to the clone's name, not the resource being cloned. So you don't need any constraints for drbd_disc0. You've set symmetric-cluster=false in the cluster options, which means that Pacemaker will not start resources on any node unless a location constaint enables it. Here, you're only enabling ct2. Duplicate the constraint for ct1 (or set symmetric-cluster=true and use a -INF location constraint for the third node instead). > property $id="cib-bootstrap-options" \ > dc-version="1.1.10-42f2063" \ > cluster-infrastructure="corosync" \ > stonith-enabled="false" \ I'm sure you've heard this before, but stonith is the only way to avoid data corruption in a split-brain situation. It's usually best to make fencing the first priority rather than save it for last, because some problems can become more difficult to troubleshoot without fencing. DRBD in particular needs special configuration to coordinate fencing with Pacemaker: https://drbd.linbit.com/users-guide/s-pacemaker-fencing.html > no-quorum-policy="stop" \ > symmetric-cluster="false" > > > I think I’m missing something basic between the DRBD/Pacemaker hook-up. > > As soon as Pacemaker/Corosync start, DRBD on both nodes stop. a “cat > /proc/drbd” then just returns: > > version: 8.4.3 (api:1/proto:86-101) > srcversion: 6551AD2C98F533733BE558C > > and no details on the replicated disc and the drbd block device disappears. > > GTG > ___ Users mailing list: Users@clusterlabs.org http://clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [ClusterLabs] Help needed getting DRBD cluster working
On 06/10/15 10:38 AM, Gordon Ross wrote: > stonith-enabled="false" \ > no-quorum-policy="stop" \ > symmetric-cluster="false" > > > I think I’m missing something basic between the DRBD/Pacemaker hook-up. For one, you must have stonith configured *and* tested in pacemaker, then hook DRBD's fencing into it with the crm-{un,}fence-peer.sh handlers and using 'fencing resource-and-stonith;'. -- Digimer Papers and Projects: https://alteeve.ca/w/ What if the cure for cancer is trapped in the mind of a person without access to education? ___ Users mailing list: Users@clusterlabs.org http://clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [ClusterLabs] Help needed getting DRBD cluster working
On 5 Oct 2015, at 15:05, Ken Gaillot wrote: > > The "rc=6" in the failed actions means the resource's Pacemaker > configuration is invalid. (For OCF return codes, see > http://clusterlabs.org/doc/en-US/Pacemaker/1.1-pcs/html-single/Pacemaker_Explained/index.html#s-ocf-return-codes > ) > > The "_monitor_0" means that this was the initial probe that Pacemaker > does before trying to start the resource, to make sure it's not already > running. As an aside, you probably want to add recurring monitors as > well, otherwise Pacemaker won't notice if the resource fails. For > example: op monitor interval="29s" role="Master" op monitor > interval="31s" role="Slave" > > As to why the probe is failing, it's hard to tell. Double-check your > configuration to make sure disc0 is the exact DRBD name, Pacemaker can > read the DRBD configuration file, etc. You can also try running the DRBD > resource agent's "status" command manually to see if it prints a more > detailed error message. I cleated the CIB and re-created most of it with your suggested parameters. It now looks like: node $id="739377522" ct1 node $id="739377523" ct2 node $id="739377524" ct3 \ attributes standby="on" primitive drbd_disc0 ocf:linbit:drbd \ params drbd_resource="disc0" \ meta target-role="Started" \ op monitor interval="19s" on-fail="restart" role="Master" start-delay="10s" timeout="20s" \ op monitor interval="20s" on-fail="restart" role="Slave" start-delay="10s" timeout="20s" ms ms_drbd0 drbd_disc0 \ meta master-max="1" master-node-max="1" clone-max="2" clone-node-max="1" notify="true" target-role="Started" location cli-prefer-drbd_disc0 ms_drbd0 inf: ct2 location cli-prefer-ms_drbd0 ms_drbd0 inf: ct2 property $id="cib-bootstrap-options" \ dc-version="1.1.10-42f2063" \ cluster-infrastructure="corosync" \ stonith-enabled="false" \ no-quorum-policy="stop" \ symmetric-cluster="false" I think I’m missing something basic between the DRBD/Pacemaker hook-up. As soon as Pacemaker/Corosync start, DRBD on both nodes stop. a “cat /proc/drbd” then just returns: version: 8.4.3 (api:1/proto:86-101) srcversion: 6551AD2C98F533733BE558C and no details on the replicated disc and the drbd block device disappears. GTG -- Gordon Ross, ___ Users mailing list: Users@clusterlabs.org http://clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [ClusterLabs] Help needed getting DRBD cluster working
On 10/05/2015 08:09 AM, Gordon Ross wrote: > I’m trying to setup a simple DRBD cluster using Ubuntu 14.04 LTS using > Pacemaker & Corosync. My problem is getting the resource to startup. > > I’ve setup the DRBD aspect fine. Checking /proc/drbd I can see that my test > DRBD device is all synced and OK. > > Following the examples from the “Clusters From Scratch” document, I built the > following cluster configuration: > > property \ > stonith-enabled="false" \ > no-quorum-policy="stop" \ > symmetric-cluster="false" > node ct1 > node ct2 > node ct3 attributes standby="on" > primitive drbd_disc0 ocf:linbit:drbd \ > params drbd_resource="disc0" > primitive drbd_disc0_fs ocf:heartbeat:Filesystem \ > params fstype="ext4" device="/dev/drbd0" directory="/replicated/disc0" > ms ms_drbd0 drbd_disc0 \ > meta master-max="1" master-node-max="1" clone-max="2" > clone-node-max=“1” \ >notify="true" target-role="Master" > colocation filesystem_with_disc inf: drbd_disc0_fs ms_drbd0:Master > > ct1 & ct2 are the main DRBD servers, with ct3 being a witness server to avoid > split-brain problems. > > When I look at the cluster status, I get: > > crm(live)# status > Last updated: Mon Oct 5 14:04:12 2015 > Last change: Thu Oct 1 17:31:35 2015 via cibadmin on ct2 > Current DC: ct2 (739377523) - partition with quorum > 3 Nodes configured > 3 Resources configured > > > Node ct3 (739377524): standby > Online: [ ct1 ct2 ] > > > Failed actions: > drbd_disc0_monitor_0 (node=ct1, call=5, rc=6, status=complete, > last-rc-change=Thu Oct 1 16:42:11 2015 > , queued=60ms, exec=0ms > ): not configured > drbd_disc0_monitor_0 (node=ct2, call=5, rc=6, status=complete, > last-rc-change=Thu Oct 1 16:17:17 2015 > , queued=67ms, exec=0ms > ): not configured > drbd_disc0_monitor_0 (node=ct3, call=5, rc=6, status=complete, > last-rc-change=Thu Oct 1 16:42:10 2015 > , queued=54ms, exec=0ms > ): not configured > > What have I done wrong? The "rc=6" in the failed actions means the resource's Pacemaker configuration is invalid. (For OCF return codes, see http://clusterlabs.org/doc/en-US/Pacemaker/1.1-pcs/html-single/Pacemaker_Explained/index.html#s-ocf-return-codes ) The "_monitor_0" means that this was the initial probe that Pacemaker does before trying to start the resource, to make sure it's not already running. As an aside, you probably want to add recurring monitors as well, otherwise Pacemaker won't notice if the resource fails. For example: op monitor interval="29s" role="Master" op monitor interval="31s" role="Slave" As to why the probe is failing, it's hard to tell. Double-check your configuration to make sure disc0 is the exact DRBD name, Pacemaker can read the DRBD configuration file, etc. You can also try running the DRBD resource agent's "status" command manually to see if it prints a more detailed error message. ___ Users mailing list: Users@clusterlabs.org http://clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org