Andrei -- To follow up, here is the Pacemaker config. Let's not talk about fencing or quorum right now. I want to focus on the vdo issue at hand.
[root@ha09a ~]# pcs config Cluster Name: ha09ab Corosync Nodes: ha09a ha09b Pacemaker Nodes: ha09a ha09b Resources: Clone: p_drbd0-clone Meta Attrs: clone-max=2 clone-node-max=1 notify=true promotable=true promoted-max=1 promoted-node-max=1 Resource: p_drbd0 (class=ocf provider=linbit type=drbd) Attributes: drbd_resource=ha01_mysql Operations: demote interval=0s timeout=90 (p_drbd0-demote-interval-0s) monitor interval=60s (p_drbd0-monitor-interval-60s) notify interval=0s timeout=90 (p_drbd0-notify-interval-0s) promote interval=0s timeout=90 (p_drbd0-promote-interval-0s) reload interval=0s timeout=30 (p_drbd0-reload-interval-0s) start interval=0s timeout=240 (p_drbd0-start-interval-0s) stop interval=0s timeout=100 (p_drbd0-stop-interval-0s) Clone: p_drbd1-clone Meta Attrs: clone-max=2 clone-node-max=1 notify=true promotable=true promoted-max=1 promoted-node-max=1 Resource: p_drbd1 (class=ocf provider=linbit type=drbd) Attributes: drbd_resource=ha02_mysql Operations: demote interval=0s timeout=90 (p_drbd1-demote-interval-0s) monitor interval=60s (p_drbd1-monitor-interval-60s) notify interval=0s timeout=90 (p_drbd1-notify-interval-0s) promote interval=0s timeout=90 (p_drbd1-promote-interval-0s) reload interval=0s timeout=30 (p_drbd1-reload-interval-0s) start interval=0s timeout=240 (p_drbd1-start-interval-0s) stop interval=0s timeout=100 (p_drbd1-stop-interval-0s) Resource: p_vdo0 (class=lsb type=vdo0) Operations: force-reload interval=0s timeout=15 (p_vdo0-force-reload-interval-0s) monitor interval=15 timeout=15 (p_vdo0-monitor-interval-15) restart interval=0s timeout=15 (p_vdo0-restart-interval-0s) start interval=0s timeout=15 (p_vdo0-start-interval-0s) stop interval=0s timeout=15 (p_vdo0-stop-interval-0s) Resource: p_vdo1 (class=lsb type=vdo1) Operations: force-reload interval=0s timeout=15 (p_vdo1-force-reload-interval-0s) monitor interval=15 timeout=15 (p_vdo1-monitor-interval-15) restart interval=0s timeout=15 (p_vdo1-restart-interval-0s) start interval=0s timeout=15 (p_vdo1-start-interval-0s) stop interval=0s timeout=15 (p_vdo1-stop-interval-0s) Stonith Devices: Fencing Levels: Location Constraints: Ordering Constraints: promote p_drbd0-clone then start p_vdo0 (kind:Mandatory) (id:order-p_drbd0-clone-p_vdo0-mandatory) promote p_drbd1-clone then start p_vdo1 (kind:Mandatory) (id:order-p_drbd1-clone-p_vdo1-mandatory) Colocation Constraints: p_vdo0 with p_drbd0-clone (score:INFINITY) (id:colocation-p_vdo0-p_drbd0-clone-INFINITY) p_vdo1 with p_drbd1-clone (score:INFINITY) (id:colocation-p_vdo1-p_drbd1-clone-INFINITY) Ticket Constraints: Alerts: No alerts defined Resources Defaults: Meta Attrs: rsc_defaults-meta_attributes resource-stickiness=100 Operations Defaults: Meta Attrs: op_defaults-meta_attributes timeout=30s Cluster Properties: cluster-infrastructure: corosync cluster-name: ha09ab dc-version: 2.0.4-6.el8_3.2-2deceaa3ae have-watchdog: false last-lrm-refresh: 1621198059 maintenance-mode: false no-quorum-policy: ignore stonith-enabled: false Tags: No tags defined Quorum: Options: Here is the cluster status. Right now, node ha09a is primary for both drbd disks. [root@ha09a ~]# pcs status Cluster name: ha09ab Cluster Summary: * Stack: corosync * Current DC: ha09a (version 2.0.4-6.el8_3.2-2deceaa3ae) - partition with quorum * Last updated: Mon May 17 11:35:34 2021 * Last change: Mon May 17 11:34:24 2021 by hacluster via crmd on ha09a * 2 nodes configured * 6 resource instances configured (2 BLOCKED from further action due to failure) Node List: * Online: [ ha09a ha09b ] Full List of Resources: * Clone Set: p_drbd0-clone [p_drbd0] (promotable): * Masters: [ ha09a ] * Slaves: [ ha09b ] * Clone Set: p_drbd1-clone [p_drbd1] (promotable): * Masters: [ ha09a ] * Slaves: [ ha09b ] * p_vdo0 (lsb:vdo0): FAILED ha09a (blocked) * p_vdo1 (lsb:vdo1): FAILED ha09a (blocked) Failed Resource Actions: * p_vdo1_stop_0 on ha09a 'error' (1): call=21, status='Timed Out', exitreason='', last-rc-change='2021-05-17 11:29:09 -07:00', queued=0ms, exec=15001ms * p_vdo0_stop_0 on ha09a 'error' (1): call=27, status='Timed Out', exitreason='', last-rc-change='2021-05-17 11:34:26 -07:00', queued=0ms, exec=15001ms * p_vdo1_monitor_0 on ha09b 'error' (1): call=21, status='complete', exitreason='', last-rc-change='2021-05-17 11:29:08 -07:00', queued=0ms, exec=217ms * p_vdo0_monitor_0 on ha09b 'error' (1): call=28, status='complete', exitreason='', last-rc-change='2021-05-17 11:34:25 -07:00', queued=0ms, exec=182ms Daemon Status: corosync: active/disabled pacemaker: active/disabled pcsd: active/enabled The vdo devices are available... [root@ha09a ~]# vdo list vdo0 vdo1 > -----Original Message----- > From: Users <users-boun...@clusterlabs.org> On Behalf Of Eric Robinson > Sent: Monday, May 17, 2021 1:28 PM > To: Cluster Labs - All topics related to open-source clustering welcomed > <users@clusterlabs.org> > Subject: Re: [ClusterLabs] DRBD + VDO HowTo? > > Andrei -- > > Sorry for the novels. Sometimes it is hard to tell whether people want all the > configs, logs, and scripts first, or if they want a description of the problem > and what one is trying to accomplish first. I'll send whatever you want. I am > very eager to get to the bottom of this. > > I'll start with my custom LSB RA. I can send the Pacemaker config a bit later. > > [root@ha09a init.d]# ll|grep vdo > lrwxrwxrwx. 1 root root 9 May 16 10:28 vdo0 -> vdo_multi > lrwxrwxrwx. 1 root root 9 May 16 10:28 vdo1 -> vdo_multi > -rwx------. 1 root root 3623 May 16 13:21 vdo_multi > > [root@ha09a init.d]# cat vdo_multi > #!/bin/bash > > #--custom script for managing vdo volumes > > #--functions > function isActivated() { > R=$(/usr/bin/vdo status -n $VOL 2>&1) > if [ $? -ne 0 ]; then > #--error occurred checking vdo status > echo "$VOL: an error occurred checking activation status on > $MY_HOSTNAME" > return 1 > fi > R=$(/usr/bin/vdo status -n $VOL|grep Activate|awk '{$1=$1};1'|cut -d" > " > -f2) > echo "$R" > return 0 > } > > function isOnline() { > R=$(/usr/bin/vdo status -n $VOL 2>&1) > if [ $? -ne 0 ]; then > #--error occurred checking vdo status > echo "$VOL: an error occurred checking activation status on > $MY_HOSTNAME" > return 1 > fi > R=$(/usr/bin/vdo status -n $VOL|grep "Index status"|awk > '{$1=$1};1'|cut -d" " -f3) > echo "$R" > return 0 > } > > #--vars > MY_HOSTNAME=$(hostname -s) > > #--get the volume name > VOL=$(basename $0) > > #--get the action > ACTION=$1 > > #--take the requested action > case $ACTION in > > start) > > #--check current status > R=$(isOnline "$VOL") > if [ $? -ne 0 ]; then > echo "error occurred checking $VOL status on > $MY_HOSTNAME" > exit 0 > fi > if [ "$R" == "online" ]; then > echo "running on $MY_HOSTNAME" > exit 0 #--lsb: success > fi > > #--enter activation loop > ACTIVATED=no > TIMER=15 > while [ $TIMER -ge 0 ]; do > R=$(isActivated "$VOL") > if [ "$R" == "enabled" ]; then > ACTIVATED=yes > break > fi > sleep 1 > TIMER=$(( TIMER-1 )) > done > if [ "$ACTIVATED" == "no" ]; then > echo "$VOL: not activated on $MY_HOSTNAME" > exit 5 #--lsb: not running > fi > > #--enter start loop > /usr/bin/vdo start -n $VOL > ONLINE=no > TIMER=15 > while [ $TIMER -ge 0 ]; do > R=$(isOnline "$VOL") > if [ "$R" == "online" ]; then > ONLINE=yes > break > fi > sleep 1 > TIMER=$(( TIMER-1 )) > done > if [ "$ONLINE" == "yes" ]; then > echo "$VOL: started on $MY_HOSTNAME" > exit 0 #--lsb: success > else > echo "$VOL: not started on $MY_HOSTNAME (unknown > problem)" > exit 0 #--lsb: unknown problem > fi > ;; > stop) > > #--check current status > R=$(isOnline "$VOL") > if [ $? -ne 0 ]; then > echo "error occurred checking $VOL status on > $MY_HOSTNAME" > exit 0 > fi > > if [ "$R" == "not" ]; then > echo "not started on $MY_HOSTNAME" > exit 0 #--lsb: success > fi > > #--enter stop loop > /usr/bin/vdo stop -n $VOL > ONLINE=yes > TIMER=15 > while [ $TIMER -ge 0 ]; do > R=$(isOnline "$VOL") > if [ "$R" == "not" ]; then > ONLINE=no > break > fi > sleep 1 > TIMER=$(( TIMER-1 )) > done > if [ "$ONLINE" == "no" ]; then > echo "$VOL: stopped on $MY_HOSTNAME" > exit 0 #--lsb:success > else > echo "$VOL: failed to stop on $MY_HOSTNAME (unknown > problem)" > exit 0 > fi > ;; > status) > R=$(isOnline "$VOL") > if [ $? -ne 0 ]; then > echo "error occurred checking $VOL status on > $MY_HOSTNAME" > exit 5 > fi > if [ "$R" == "online" ]; then > echo "$VOL started on $MY_HOSTNAME" > exit 0 #--lsb: success > else > echo "$VOL not started on $MY_HOSTNAME" > exit 3 #--lsb: not running > fi > ;; > > esac > > > > > -----Original Message----- > > From: Users <users-boun...@clusterlabs.org> On Behalf Of Andrei > > Borzenkov > > Sent: Monday, May 17, 2021 12:49 PM > > To: users@clusterlabs.org > > Subject: Re: [ClusterLabs] DRBD + VDO HowTo? > > > > On 17.05.2021 18:18, Eric Robinson wrote: > > > To Strahil and Klaus – > > > > > > I created the vdo devices using default parameters, so ‘auto’ mode > > > was > > selected by default. vdostatus shows that the current mode is async. > > The underlying drbd devices are running protocol C, so I assume that > > vdo should be changed to sync mode? > > > > > > The VDO service is disabled and is solely under the control of > > > Pacemaker, > > but I have been unable to get a resource agent to work reliably. I > > have two nodes. Under normal operation, Node A is primary for disk > > drbd0, and device > > vdo0 rides on top of that. Node B is primary for disk drbd1 and device > > vdo1 rides on top of that. In the event of a node failure, the vdo > > device and the underlying drbd disk should migrate to the other node, > > and then that node will be primary for both drbd disks and both vdo > devices. > > > > > > The default systemd vdo service does not work because it uses the > > > –all flag > > and starts/stops all vdo devices. I noticed that there is also a > > vdo-start-by- dev.service, but there is no documentation on how to use > > it. I wrote my own vdo-by-dev system service, but that did not work > > reliably either. Then I noticed that there is already an OCF resource > > agent named vdo-vol, but that did not work either. I finally tried > > writing my own OCF-compliant RA, and then I tried writing an > > LSB-compliant script, but none of those worked very well. > > > > > > > You continue to write novels instead of simply showing your resource > > agent, your configuration and logs. > > > > > My big problem is that I don’t understand how Pacemaker uses the > > monitor action. Pacemaker would often fail vdo resources because the > > monitor action received an error when it ran on the standby node. For > > example, when Node A is primary for disk drbd1 and device vdo1, > > Pacemaker would fail device vdo1 because when it ran the monitor > > action on Node B, the RA reported an error. But OF COURSE it would > > report an error, because disk drbd1 is secondary on that node, and is > > therefore inaccessible to the vdo driver. I DON’T UNDERSTAND. > > > > > > > May be your definition of "error" does not match pacemaker definition > > of "error". It is hard to comment without seeing code. > > > > > -Eric > > > > > > > > > > > > From: Strahil Nikolov <hunter86...@yahoo.com> > > > Sent: Monday, May 17, 2021 5:09 AM > > > To: kwenn...@redhat.com; Klaus Wenninger <kwenn...@redhat.com>; > > > Cluster Labs - All topics related to open-source clustering welcomed > > > <users@clusterlabs.org>; Eric Robinson <eric.robin...@psmnv.com> > > > Subject: Re: [ClusterLabs] DRBD + VDO HowTo? > > > > > > Have you tried to set VDO in async mode ? > > > > > > Best Regards, > > > Strahil Nikolov > > > On Mon, May 17, 2021 at 8:57, Klaus Wenninger > > > <kwenn...@redhat.com<mailto:kwenn...@redhat.com>> wrote: > > > Did you try VDO in sync-mode for the case the flush-fua stuff isn't > > > working through the layers? > > > Did you check that VDO-service is disabled and solely under > > > pacemaker-control and that the dependencies are set correctly? > > > > > > Klaus > > > > > > On 5/17/21 6:17 AM, Eric Robinson wrote: > > > > > > Yes, DRBD is working fine. > > > > > > > > > > > > From: Strahil Nikolov > > > <hunter86...@yahoo.com><mailto:hunter86...@yahoo.com> > > > Sent: Sunday, May 16, 2021 6:06 PM > > > To: Eric Robinson > > > <eric.robin...@psmnv.com><mailto:eric.robin...@psmnv.com>; > Cluster > > > Labs - All topics related to open-source clustering welcomed > > > <users@clusterlabs.org><mailto:users@clusterlabs.org> > > > Subject: RE: [ClusterLabs] DRBD + VDO HowTo? > > > > > > > > > > > > Are you sure that the DRBD is working properly ? > > > > > > > > > > > > Best Regards, > > > > > > Strahil Nikolov > > > > > > On Mon, May 17, 2021 at 0:32, Eric Robinson > > > > > > <eric.robin...@psmnv.com<mailto:eric.robin...@psmnv.com>> wrote: > > > > > > Okay, it turns out I was wrong. I thought I had it working, but I > > > keep running > > into problems. Sometimes when I demote a DRBD resource on Node A and > > promote it on Node B, and I try to mount the filesystem, the system > > complains that it cannot read the superblock. But when I move the DRBD > > primary back to Node A, the file system is mountable again. Also, I > > have problems with filesystems not mounting because the vdo devices > > are not present. All kinds of issues. > > > > > > > > > > > > > > > > > > From: Users > > > <users-boun...@clusterlabs.org<mailto:users- > boun...@clusterlabs.org> > > > > > > > On Behalf Of Eric Robinson > > > Sent: Friday, May 14, 2021 3:55 PM > > > To: Strahil Nikolov > > > <hunter86...@yahoo.com<mailto:hunter86...@yahoo.com>>; Cluster > > Labs - > > > All topics related to open-source clustering welcomed > > > <users@clusterlabs.org<mailto:users@clusterlabs.org>> > > > Subject: Re: [ClusterLabs] DRBD + VDO HowTo? > > > > > > > > > > > > > > > > > > Okay, I have it working now. The default systemd service definitions > > > did > > not work, so I created my own. > > > > > > > > > > > > > > > > > > From: Strahil Nikolov > > > <hunter86...@yahoo.com<mailto:hunter86...@yahoo.com>> > > > Sent: Friday, May 14, 2021 3:41 AM > > > To: Eric Robinson > > > <eric.robin...@psmnv.com<mailto:eric.robin...@psmnv.com>>; > Cluster > > > Labs - All topics related to open-source clustering welcomed > > > <users@clusterlabs.org<mailto:users@clusterlabs.org>> > > > Subject: RE: [ClusterLabs] DRBD + VDO HowTo? > > > > > > > > > > > > There is no VDO RA according to my knowledge, but you can use > > > systemd > > service as a resource. > > > > > > > > > > > > Yet, the VDO service that comes with thr OS is a generic one and > > > controlls > > all VDOs - so you need to create your own vdo service. > > > > > > > > > > > > Best Regards, > > > > > > Strahil Nikolov > > > > > > On Fri, May 14, 2021 at 6:55, Eric Robinson > > > > > > <eric.robin...@psmnv.com<mailto:eric.robin...@psmnv.com>> wrote: > > > > > > I created the VDO volumes fine on the drbd devices, formatted them > > > as xfs > > filesystems, created cluster filesystem resources, and the cluster us > > using them. But the cluster won’t fail over. Is there a VDO cluster RA > > out there somewhere already? > > > > > > > > > > > > > > > > > > From: Strahil Nikolov > > > <hunter86...@yahoo.com<mailto:hunter86...@yahoo.com>> > > > Sent: Thursday, May 13, 2021 10:07 PM > > > To: Cluster Labs - All topics related to open-source clustering > > > welcomed <users@clusterlabs.org<mailto:users@clusterlabs.org>>; Eric > > > Robinson > > <eric.robin...@psmnv.com<mailto:eric.robin...@psmnv.com>> > > > Subject: Re: [ClusterLabs] DRBD + VDO HowTo? > > > > > > > > > > > > For DRBD there is enough info, so let's focus on VDO. > > > > > > There is a systemd service that starts all VDOs on the system. You > > > can > > create the VDO once drbs is open for writes and then you can create > > your own systemd '.service' file which can be used as a cluster resource. > > > > > > > > > Best Regards, > > > > > > Strahil Nikolov > > > > > > > > > > > > On Fri, May 14, 2021 at 2:33, Eric Robinson > > > > > > <eric.robin...@psmnv.com<mailto:eric.robin...@psmnv.com>> wrote: > > > > > > Can anyone point to a document on how to use VDO de-duplication with > > DRBD? Linbit has a blog page about it, but it was last updated 6 years > > ago and the embedded links are dead. > > > > > > > > > > > > https://linbit.com/blog/albireo-virtual-data-optimizer-vdo-on-drbd/ > > > > > > > > > > > > -Eric > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Disclaimer : This email and any files transmitted with it are > > > confidential and > > intended solely for intended recipients. If you are not the named > > addressee you should not disseminate, distribute, copy or alter this > > email. Any views or opinions presented in this email are solely those > > of the author and might not represent those of Physician Select > > Management. Warning: Although Physician Select Management has taken > > reasonable precautions to ensure no viruses are present in this email, > > the company cannot accept responsibility for any loss or damage arising > from the use of this email or attachments. > > > > > > _______________________________________________ > > > Manage your subscription: > > > https://lists.clusterlabs.org/mailman/listinfo/users > > > > > > ClusterLabs home: https://www.clusterlabs.org/ > > > > > > Disclaimer : This email and any files transmitted with it are > > > confidential and > > intended solely for intended recipients. If you are not the named > > addressee you should not disseminate, distribute, copy or alter this > > email. Any views or opinions presented in this email are solely those > > of the author and might not represent those of Physician Select > > Management. Warning: Although Physician Select Management has taken > > reasonable precautions to ensure no viruses are present in this email, > > the company cannot accept responsibility for any loss or damage arising > from the use of this email or attachments. > > > > > > Disclaimer : This email and any files transmitted with it are > > > confidential and > > intended solely for intended recipients. If you are not the named > > addressee you should not disseminate, distribute, copy or alter this > > email. Any views or opinions presented in this email are solely those > > of the author and might not represent those of Physician Select > > Management. Warning: Although Physician Select Management has taken > > reasonable precautions to ensure no viruses are present in this email, > > the company cannot accept responsibility for any loss or damage arising > from the use of this email or attachments. > > > > > > Disclaimer : This email and any files transmitted with it are > > > confidential and > > intended solely for intended recipients. If you are not the named > > addressee you should not disseminate, distribute, copy or alter this > > email. Any views or opinions presented in this email are solely those > > of the author and might not represent those of Physician Select > > Management. Warning: Although Physician Select Management has taken > > reasonable precautions to ensure no viruses are present in this email, > > the company cannot accept responsibility for any loss or damage arising > from the use of this email or attachments. > > > Disclaimer : This email and any files transmitted with it are > > > confidential and > > intended solely for intended recipients. If you are not the named > > addressee you should not disseminate, distribute, copy or alter this > > email. Any views or opinions presented in this email are solely those > > of the author and might not represent those of Physician Select > > Management. Warning: Although Physician Select Management has taken > > reasonable precautions to ensure no viruses are present in this email, > > the company cannot accept responsibility for any loss or damage arising > from the use of this email or attachments. > > > > > > _______________________________________________ > > > > > > Manage your subscription: > > > > > > https://lists.clusterlabs.org/mailman/listinfo/users > > > > > > > > > > > > ClusterLabs home: https://www.clusterlabs.org/ > > > > > > Disclaimer : This email and any files transmitted with it are > > > confidential and > > intended solely for intended recipients. If you are not the named > > addressee you should not disseminate, distribute, copy or alter this > > email. Any views or opinions presented in this email are solely those > > of the author and might not represent those of Physician Select > > Management. Warning: Although Physician Select Management has taken > > reasonable precautions to ensure no viruses are present in this email, > > the company cannot accept responsibility for any loss or damage arising > from the use of this email or attachments. > > > > > > > > > _______________________________________________ > > > Manage your subscription: > > > https://lists.clusterlabs.org/mailman/listinfo/users > > > > > > ClusterLabs home: https://www.clusterlabs.org/ > > > > > > > _______________________________________________ > > Manage your subscription: > > https://lists.clusterlabs.org/mailman/listinfo/users > > > > ClusterLabs home: https://www.clusterlabs.org/ > Disclaimer : This email and any files transmitted with it are confidential and > intended solely for intended recipients. If you are not the named addressee > you should not disseminate, distribute, copy or alter this email. Any views or > opinions presented in this email are solely those of the author and might not > represent those of Physician Select Management. Warning: Although > Physician Select Management has taken reasonable precautions to ensure > no viruses are present in this email, the company cannot accept responsibility > for any loss or damage arising from the use of this email or attachments. > _______________________________________________ > Manage your subscription: > https://lists.clusterlabs.org/mailman/listinfo/users > > ClusterLabs home: https://www.clusterlabs.org/ Disclaimer : This email and any files transmitted with it are confidential and intended solely for intended recipients. If you are not the named addressee you should not disseminate, distribute, copy or alter this email. Any views or opinions presented in this email are solely those of the author and might not represent those of Physician Select Management. Warning: Although Physician Select Management has taken reasonable precautions to ensure no viruses are present in this email, the company cannot accept responsibility for any loss or damage arising from the use of this email or attachments. _______________________________________________ Manage your subscription: https://lists.clusterlabs.org/mailman/listinfo/users ClusterLabs home: https://www.clusterlabs.org/