Re: [ClusterLabs] HALVM monitor action fail on slave node. Possible bug?
Hi Emmanuel, thank you for you support. I did a lot of checks during the WE and there are some updates: - Main problem is that ocf:heartbeat:LVM is old. The current version on centos 7 is 3.9.5 (package resource-agents). More precisely, in 3.9.5 the monitor function has one important assumption: the underlying storage is shared between all nodes in the cluster. So the monitor function checks the presence of the volume group on all nodes. From version 3.9.6 this is not the normal behavior and the monitor function (LVM_status) returns $OCF_NOT_RUNNING from slaves nodes without errors. You can check this in the file /usr/lib/ocf/resource.d/heartbeat/LVM in lines 340-351 that disappears in version 3.9.6. Obviously this is not error, but an important change in the cluster architecture because I need to use drbd in dual primary mode when version 3.9.5 is used. My personal idea is that drbd in dual primary mode with lvm is not a good idea due to the fact that I don't need an active/active cluster. Anyway, thank you for your time again Marco 2018-04-13 15:54 GMT+02:00 emmanuel segura : > the first thing that you need to configure is the stonith, because you > have this constraint "constraint order promote DrbdResClone then start > HALVM" > > To recover and promote drbd to master when you crash a node, configurare > the drbd fencing handler. > > pacemaker execute monitor in both nodes, so this is normal, to test why > monitor fail, use ocf-tester > > 2018-04-13 15:29 GMT+02:00 Marco Marino : > >> Hello, I'm trying to configure a simple 2 node cluster with drbd and >> HALVM (ocf:heartbeat:LVM) but I have a problem that I'm not able to solve, >> to I decided to write this long post. I need to really understand what I'm >> doing and where I'm doing wrong. >> More precisely, I'm configuring a pacemaker cluster with 2 nodes and only >> one drbd resource. Here all operations: >> >> - System configuration >> hostnamectl set-hostname pcmk[12] >> yum update -y >> yum install vim wget git -y >> vim /etc/sysconfig/selinux -> permissive mode >> systemctl disable firewalld >> reboot >> >> - Network configuration >> [pcmk1] >> nmcli connection modify corosync ipv4.method manual ipv4.addresses >> 192.168.198.201/24 ipv6.method ignore connection.autoconnect yes >> nmcli connection modify replication ipv4.method manual ipv4.addresses >> 192.168.199.201/24 ipv6.method ignore connection.autoconnect yes >> [pcmk2] >> nmcli connection modify corosync ipv4.method manual ipv4.addresses >> 192.168.198.202/24 ipv6.method ignore connection.autoconnect yes >> nmcli connection modify replication ipv4.method manual ipv4.addresses >> 192.168.199.202/24 ipv6.method ignore connection.autoconnect yes >> >> ssh-keyget -t rsa >> ssh-copy-id root@pcmk[12] >> scp /etc/hosts root@pcmk2:/etc/hosts >> >> - Drbd Repo configuration and drbd installation >> rpm --import https://www.elrepo.org/RPM-GPG-KEY-elrepo.org >> rpm -Uvh http://www.elrepo.org/elrepo-release-7.0-3.el7.elrepo.noarch >> .rpm >> yum update -y >> yum install drbd84-utils kmod-drbd84 -y >> >> - Drbd Configuration: >> Creating a new partition on top of /dev/vdb -> /dev/vdb1 of type >> "Linux" (83) >> [/etc/drbd.d/global_common.conf] >> usage-count no; >> [/etc/drbd.d/myres.res] >> resource myres { >> on pcmk1 { >> device /dev/drbd0; >> disk /dev/vdb1; >> address 192.168.199.201:7789; >> meta-disk internal; >> } >> on pcmk2 { >> device /dev/drbd0; >> disk /dev/vdb1; >> address 192.168.199.202:7789; >> meta-disk internal; >> } >> } >> >> scp /etc/drbd.d/myres.res root@pcmk2:/etc/drbd.d/myres.res >> systemctl start drbd <-- only for test. The service is disabled at >> boot! >> drbdadm create-md myres >> drbdadm up myres >> drbdadm primary --force myres >> >> - LVM Configuration >> [root@pcmk1 ~]# lsblk >> NAMEMAJ:MIN RM SIZE RO TYPE MOUNTPOINT >> sr0 11:01 1024M 0 rom >> vda 252:00 20G 0 disk >> ├─vda1 252:101G 0 part /boot >> └─vda2 252:20 19G 0 part >> ├─cl-root 253:00 17G 0 lvm / >> └─cl-swap 253:102G 0 lvm [SWAP] >> vdb 252:16 08G 0 disk >> └─vdb1 252:17 08G 0 part <--- /dev/vdb1 is the partition >> I'd like to use as backing device for drbd >> └─drbd0 147:008G 0 disk >> >> [/etc/lvm/lvm.conf] >> write_cache_state = 0 >> use_lvmetad = 0 >> filter = [ "a|drbd.*|", "a|vda.*|", "r|.*|" ] >> >> Disabling lvmetad service >> systemctl disable lvm2-lvmetad.service >> systemctl disable lvm2-lvmetad.socket >> reboot >> >> - Creating volume group and logical volume >> systemctl start drbd (both nodes) >> drbdadm p
Re: [ClusterLabs] HALVM monitor action fail on slave node. Possible bug?
the first thing that you need to configure is the stonith, because you have this constraint "constraint order promote DrbdResClone then start HALVM" To recover and promote drbd to master when you crash a node, configurare the drbd fencing handler. pacemaker execute monitor in both nodes, so this is normal, to test why monitor fail, use ocf-tester 2018-04-13 15:29 GMT+02:00 Marco Marino : > Hello, I'm trying to configure a simple 2 node cluster with drbd and HALVM > (ocf:heartbeat:LVM) but I have a problem that I'm not able to solve, to I > decided to write this long post. I need to really understand what I'm doing > and where I'm doing wrong. > More precisely, I'm configuring a pacemaker cluster with 2 nodes and only > one drbd resource. Here all operations: > > - System configuration > hostnamectl set-hostname pcmk[12] > yum update -y > yum install vim wget git -y > vim /etc/sysconfig/selinux -> permissive mode > systemctl disable firewalld > reboot > > - Network configuration > [pcmk1] > nmcli connection modify corosync ipv4.method manual ipv4.addresses > 192.168.198.201/24 ipv6.method ignore connection.autoconnect yes > nmcli connection modify replication ipv4.method manual ipv4.addresses > 192.168.199.201/24 ipv6.method ignore connection.autoconnect yes > [pcmk2] > nmcli connection modify corosync ipv4.method manual ipv4.addresses > 192.168.198.202/24 ipv6.method ignore connection.autoconnect yes > nmcli connection modify replication ipv4.method manual ipv4.addresses > 192.168.199.202/24 ipv6.method ignore connection.autoconnect yes > > ssh-keyget -t rsa > ssh-copy-id root@pcmk[12] > scp /etc/hosts root@pcmk2:/etc/hosts > > - Drbd Repo configuration and drbd installation > rpm --import https://www.elrepo.org/RPM-GPG-KEY-elrepo.org > rpm -Uvh http://www.elrepo.org/elrepo-release-7.0-3.el7.elrepo. > noarch.rpm > yum update -y > yum install drbd84-utils kmod-drbd84 -y > > - Drbd Configuration: > Creating a new partition on top of /dev/vdb -> /dev/vdb1 of type > "Linux" (83) > [/etc/drbd.d/global_common.conf] > usage-count no; > [/etc/drbd.d/myres.res] > resource myres { > on pcmk1 { > device /dev/drbd0; > disk /dev/vdb1; > address 192.168.199.201:7789; > meta-disk internal; > } > on pcmk2 { > device /dev/drbd0; > disk /dev/vdb1; > address 192.168.199.202:7789; > meta-disk internal; > } > } > > scp /etc/drbd.d/myres.res root@pcmk2:/etc/drbd.d/myres.res > systemctl start drbd <-- only for test. The service is disabled at > boot! > drbdadm create-md myres > drbdadm up myres > drbdadm primary --force myres > > - LVM Configuration > [root@pcmk1 ~]# lsblk > NAMEMAJ:MIN RM SIZE RO TYPE MOUNTPOINT > sr0 11:01 1024M 0 rom > vda 252:00 20G 0 disk > ├─vda1 252:101G 0 part /boot > └─vda2 252:20 19G 0 part > ├─cl-root 253:00 17G 0 lvm / > └─cl-swap 253:102G 0 lvm [SWAP] > vdb 252:16 08G 0 disk > └─vdb1 252:17 08G 0 part <--- /dev/vdb1 is the partition > I'd like to use as backing device for drbd > └─drbd0 147:008G 0 disk > > [/etc/lvm/lvm.conf] > write_cache_state = 0 > use_lvmetad = 0 > filter = [ "a|drbd.*|", "a|vda.*|", "r|.*|" ] > > Disabling lvmetad service > systemctl disable lvm2-lvmetad.service > systemctl disable lvm2-lvmetad.socket > reboot > > - Creating volume group and logical volume > systemctl start drbd (both nodes) > drbdadm primary myres > pvcreate /dev/drbd0 > vgcreate havolumegroup /dev/drbd0 > lvcreate -n c-vol1 -L1G havolumegroup > [root@pcmk1 ~]# lvs > LV VGAttr LSize Pool Origin Data% Meta% > Move Log Cpy%Sync Convert > root cl-wi-ao <17.00g > > swap cl-wi-ao 2.00g > > c-vol1 havolumegroup -wi-a- 1.00g > > > - Cluster Configuration > yum install pcs fence-agents-all -y > systemctl enable pcsd > systemctl start pcsd > echo redhat | passwd --stdin hacluster > pcs cluster auth pcmk1 pcmk2 > pcs cluster setup --name ha_cluster pcmk1 pcmk2 > pcs cluster start --all > pcs cluster enable --all > pcs property set stonith-enabled=false<--- Just for test!!! > pcs property set no-quorum-policy=ignore > > - Drbd resource configuration > pcs cluster cib drbd_cfg > pcs -f drbd_cfg resource create DrbdRes ocf:linbit:drbd > drbd_resource=myres op monitor interval=60s > pcs -f drbd_cfg resource master DrbdResClone DrbdRes master-max=1 > master-node-max=1 clone-max=2 clone-node-max=1 notify=true > [root@pcmk1 ~]# pcs -f drbd_cfg resource show > Ma
[ClusterLabs] HALVM monitor action fail on slave node. Possible bug?
Hello, I'm trying to configure a simple 2 node cluster with drbd and HALVM (ocf:heartbeat:LVM) but I have a problem that I'm not able to solve, to I decided to write this long post. I need to really understand what I'm doing and where I'm doing wrong. More precisely, I'm configuring a pacemaker cluster with 2 nodes and only one drbd resource. Here all operations: - System configuration hostnamectl set-hostname pcmk[12] yum update -y yum install vim wget git -y vim /etc/sysconfig/selinux -> permissive mode systemctl disable firewalld reboot - Network configuration [pcmk1] nmcli connection modify corosync ipv4.method manual ipv4.addresses 192.168.198.201/24 ipv6.method ignore connection.autoconnect yes nmcli connection modify replication ipv4.method manual ipv4.addresses 192.168.199.201/24 ipv6.method ignore connection.autoconnect yes [pcmk2] nmcli connection modify corosync ipv4.method manual ipv4.addresses 192.168.198.202/24 ipv6.method ignore connection.autoconnect yes nmcli connection modify replication ipv4.method manual ipv4.addresses 192.168.199.202/24 ipv6.method ignore connection.autoconnect yes ssh-keyget -t rsa ssh-copy-id root@pcmk[12] scp /etc/hosts root@pcmk2:/etc/hosts - Drbd Repo configuration and drbd installation rpm --import https://www.elrepo.org/RPM-GPG-KEY-elrepo.org rpm -Uvh http://www.elrepo.org/elrepo-release-7.0-3.el7.elrepo.noarch.rpm yum update -y yum install drbd84-utils kmod-drbd84 -y - Drbd Configuration: Creating a new partition on top of /dev/vdb -> /dev/vdb1 of type "Linux" (83) [/etc/drbd.d/global_common.conf] usage-count no; [/etc/drbd.d/myres.res] resource myres { on pcmk1 { device /dev/drbd0; disk /dev/vdb1; address 192.168.199.201:7789; meta-disk internal; } on pcmk2 { device /dev/drbd0; disk /dev/vdb1; address 192.168.199.202:7789; meta-disk internal; } } scp /etc/drbd.d/myres.res root@pcmk2:/etc/drbd.d/myres.res systemctl start drbd <-- only for test. The service is disabled at boot! drbdadm create-md myres drbdadm up myres drbdadm primary --force myres - LVM Configuration [root@pcmk1 ~]# lsblk NAMEMAJ:MIN RM SIZE RO TYPE MOUNTPOINT sr0 11:01 1024M 0 rom vda 252:00 20G 0 disk ├─vda1 252:101G 0 part /boot └─vda2 252:20 19G 0 part ├─cl-root 253:00 17G 0 lvm / └─cl-swap 253:102G 0 lvm [SWAP] vdb 252:16 08G 0 disk └─vdb1 252:17 08G 0 part <--- /dev/vdb1 is the partition I'd like to use as backing device for drbd └─drbd0 147:008G 0 disk [/etc/lvm/lvm.conf] write_cache_state = 0 use_lvmetad = 0 filter = [ "a|drbd.*|", "a|vda.*|", "r|.*|" ] Disabling lvmetad service systemctl disable lvm2-lvmetad.service systemctl disable lvm2-lvmetad.socket reboot - Creating volume group and logical volume systemctl start drbd (both nodes) drbdadm primary myres pvcreate /dev/drbd0 vgcreate havolumegroup /dev/drbd0 lvcreate -n c-vol1 -L1G havolumegroup [root@pcmk1 ~]# lvs LV VGAttr LSize Pool Origin Data% Meta% Move Log Cpy%Sync Convert root cl-wi-ao <17.00g swap cl-wi-ao 2.00g c-vol1 havolumegroup -wi-a- 1.00g - Cluster Configuration yum install pcs fence-agents-all -y systemctl enable pcsd systemctl start pcsd echo redhat | passwd --stdin hacluster pcs cluster auth pcmk1 pcmk2 pcs cluster setup --name ha_cluster pcmk1 pcmk2 pcs cluster start --all pcs cluster enable --all pcs property set stonith-enabled=false<--- Just for test!!! pcs property set no-quorum-policy=ignore - Drbd resource configuration pcs cluster cib drbd_cfg pcs -f drbd_cfg resource create DrbdRes ocf:linbit:drbd drbd_resource=myres op monitor interval=60s pcs -f drbd_cfg resource master DrbdResClone DrbdRes master-max=1 master-node-max=1 clone-max=2 clone-node-max=1 notify=true [root@pcmk1 ~]# pcs -f drbd_cfg resource show Master/Slave Set: DrbdResClone [DrbdRes] Stopped: [ pcmk1 pcmk2 ] [root@pcmk1 ~]# Testing the failover with a forced shutoff of pcmk1. When pcmk1 returns up, drbd is slave but logical volume is not active on pcmk2. So I need HALVM [root@pcmk2 ~]# lvs LV VGAttr LSize Pool Origin Data% Meta% Move Log Cpy%Sync Convert root cl-wi-ao <17.00g swap cl-wi-ao 2.00g c-vol1 havolumegroup -wi--- 1.00g [root@pcmk2 ~]# - Lvm resource and constraints pcs cluster cib lvm_cfg pcs -f lvm_cfg resource create HALVM ocf:heartbea