[ClusterLabs] volume group won't start in a nested DRBD setup

Jean-Francois Malouin Mon, 28 Oct 2019 12:45:37 -0700

Hi,

Is there any new magic that I'm unaware of that needs to be added to a
pacemaker cluster using a DRBD nested setup? pacemaker 2.0.x and DRBD 8.4.10 on
Debian/Buster on a 2-node cluster with stonith.
Eventually this will host a bunch of Xen VMs.


I had this sort of thing running for years with pacemaker 1.x  DRBD 8.4.x
without an itch and now with pacemaker 2.0 and drbd 8.4.10 it gives me errors
on trying to start the volume group vg0 on this chain:

 (VG)           (LV)         (PV)       (VG) 
vmspace ----> xen_lv0 ----> drbd0 ----> vg0 

Only drbd0 and after are managed by pacemaker.

Here's what I have configured so far (stonith is configured but is not shown 
below):

---
primitive p_lvm_vg0 ocf:heartbeat:LVM \
    params volgrpname=vg0 \
    op monitor timeout=30s interval=10s \
    op_params interval=10s

primitive resDRBDr0 ocf:linbit:drbd \
    params drbd_resource=r0 \
    op start interval=0 timeout=240s \
    op stop interval=0 timeout=100s \
    op monitor interval=29s role=Master timeout=240s \
    op monitor interval=31s role=Slave timeout=240s \
    meta migration-threshold=3 failure-timeout=120s

ms ms_drbd_r0 resDRBDr0 \
    meta master-max=1 master-node-max=1 clone-max=2 clone-node-max=1 notify=true

colocation c_lvm_vg0_on_drbd_r0 inf: p_lvm_vg0 ms_drbd_r0:Master

order o_drbd_r0_before_lvm_vg0 Mandatory: ms_drbd_r0:promote p_lvm_vg0:start
---

/etc/lvm/lvm.conf has global_filter set to:
global_filter = [ "a|/dev/drbd.*|", "a|/dev/md.*|", "a|/dev/md/.*|", "r|.*|" ]

But I'm note sure if its sufficient. I seem to be missing some crucial 
ingredient.

syslog on the DC shows the following when trying to start vg0:

Oct 28 14:42:56 node2 LVM(p_lvm_vg0)[8775]: INFO: Activating volume group vg0
Oct 28 14:42:56 node2 LVM(p_lvm_vg0)[8775]: INFO:  Reading all physical 
volumes. This may take a while... Found volume group "vmspace" using metadata 
type lvm2 Found volume group "freespace" using metadata type
 lvm2 Found volume group "vg0" using metadata type lvm2 
Oct 28 14:42:56 node2 LVM(p_lvm_vg0)[8775]: INFO:  0 logical volume(s) in 
volume group "vg0" now active 
Oct 28 14:42:56 node2 LVM(p_lvm_vg0)[8775]: ERROR: LVM Volume vg0 is not 
available (stopped)
Oct 28 14:42:56 node2 LVM(p_lvm_vg0)[8775]: ERROR: LVM: vg0 did not activate 
correctly
Oct 28 14:42:56 node2 pacemaker-execd[27054]:  notice: 
p_lvm_vg0_start_0:8775:stderr [   Configuration node global/use_lvmetad not 
found ]
Oct 28 14:42:56 node2 pacemaker-execd[27054]:  notice: 
p_lvm_vg0_start_0:8775:stderr [ ocf-exit-reason:LVM: vg0 did not activate 
correctly ]
Oct 28 14:42:56 node2 pacemaker-controld[27057]:  notice: Result of start 
operation for p_lvm_vg0 on node2: 7 (not running) 
Oct 28 14:42:56 node2 pacemaker-controld[27057]:  notice: 
node2-p_lvm_vg0_start_0:77 [   Configuration node global/use_lvmetad not 
found\nocf-exit-reason:LVM: vg0 did not activate correctly\n ]
Oct 28 14:42:56 node2 pacemaker-controld[27057]:  warning: Action 42 
(p_lvm_vg0_start_0) on node2 failed (target: 0 vs. rc: 7): Error
Oct 28 14:42:56 node2 pacemaker-controld[27057]:  notice: Transition 602 
aborted by operation p_lvm_vg0_start_0 'modify' on node2: Event failed 
Oct 28 14:42:56 node2 pacemaker-controld[27057]:  notice: Transition 602 
(Complete=28, Pending=0, Fired=0, Skipped=0, Incomplete=1, 
Source=/var/lib/pacemaker/pengine/pe-input-39.bz2): Complete
Oct 28 14:42:56 node2 pacemaker-schedulerd[27056]:  notice: On loss of quorum: 
Ignore
Oct 28 14:42:56 node2 pacemaker-schedulerd[27056]:  warning: Processing failed 
start of p_lvm_vg0 on node2: not running 
Oct 28 14:42:56 node2 pacemaker-schedulerd[27056]:  warning: Processing failed 
start of p_lvm_vg0 on node2: not running 
Oct 28 14:42:56 node2 pacemaker-schedulerd[27056]:  warning: Processing failed 
start of p_lvm_vg0 on node1: not running 
Oct 28 14:42:56 node2 pacemaker-schedulerd[27056]:  warning: Forcing p_lvm_vg0 
away from node1 after 1000000 failures (max=1000000)
Oct 28 14:42:56 node2 pacemaker-schedulerd[27056]:  notice:  * Recover    
p_lvm_vg0           (                 node2 )  
Oct 28 14:42:56 node2 pacemaker-schedulerd[27056]:  notice: Calculated 
transition 603, saving inputs in /var/lib/pacemaker/pengine/pe-input-40.bz2
Oct 28 14:42:57 node2 pacemaker-schedulerd[27056]:  notice: On loss of quorum: 
Ignore
Oct 28 14:42:57 node2 pacemaker-schedulerd[27056]:  warning: Processing failed 
start of p_lvm_vg0 on node2: not running 
Oct 28 14:42:57 node2 pacemaker-schedulerd[27056]:  warning: Processing failed 
start of p_lvm_vg0 on node2: not running 
Oct 28 14:42:57 node2 pacemaker-schedulerd[27056]:  warning: Processing failed 
start of p_lvm_vg0 on node1: not running 
Oct 28 14:42:57 node2 pacemaker-schedulerd[27056]:  warning: Forcing p_lvm_vg0 
away from node2 after 1000000 failures (max=1000000)
Oct 28 14:42:57 node2 pacemaker-schedulerd[27056]:  warning: Forcing p_lvm_vg0 
away from node1 after 1000000 failures (max=1000000)
Oct 28 14:42:57 node2 pacemaker-schedulerd[27056]:  notice:  * Stop       
p_lvm_vg0           (                 node2 )   due to node availability
Oct 28 14:42:57 node2 pacemaker-schedulerd[27056]:  notice: Calculated 
transition 604, saving inputs in /var/lib/pacemaker/pengine/pe-input-41.bz2
Oct 28 14:42:57 node2 pacemaker-controld[27057]:  notice: Initiating stop 
operation p_lvm_vg0_stop_0 locally on node2 
Oct 28 14:42:57 node2 LVM(p_lvm_vg0)[8881]: INFO: Deactivating volume group vg0
Oct 28 14:42:57 node2 LVM(p_lvm_vg0)[8881]: INFO:  0 logical volume(s) in 
volume group "vg0" now active 
Oct 28 14:42:57 node2 LVM(p_lvm_vg0)[8881]: INFO: LVM Volume vg0 is not 
available (stopped)

Any help gratefully accepted!
jf
_______________________________________________
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/

[ClusterLabs] volume group won't start in a nested DRBD setup

Reply via email to