>>> Gang He <[email protected]> schrieb am 21.01.2021 um 11:30 in Nachricht <[email protected]>: > Hi Ulrich, > > The problem is reproduced stably? could you help to share your > pacemaker crm configure and OS/lvm2/resource‑agents related version > information?
OK, the problem occurred on every node, so I guess it's reproducible. OS is SLES15 SP2 with all current updates (lvm2-2.03.05-8.18.1.x86_64, pacemaker-2.0.4+20200616.2deceaa3a-3.3.1.x86_64, resource-agents-4.4.0+git57.70549516-3.12.1.x86_64). The configuration (somewhat trimmed) is attached. The only VG the cluster node sees is: ph16:~ # vgs VG #PV #LV #SN Attr VSize VFree sys 1 3 0 wz--n- 222.50g 0 Regards, Ulrich > I feel the problem was probably caused by lvmlock resource agent script, > which did not handle this corner case correctly. > > Thanks > Gang > > > On 2021/1/21 17:53, Ulrich Windl wrote: >> Hi! >> >> I have a problem: For tests I had configured lvmlockd. Now that the tests > have ended, no LVM is used for cluster resources any more, but lvmlockd is > still configured. >> Unfortunately I ran into this problem: >> On OCFS2 mount was unmounted successfully, another holding the lockspace for > lvmlockd is still active. >> lvmlockd shuts down. At least it says so. >> >> Unfortunately that stop never succeeds (runs into a timeout). >> >> My suspect is something like this: >> Some non‑LVM lock exists for the now unmounted OCFS2 filesystem. >> lvmlockd want to access that filesystem for unknown reasons. >> >> I don't understand waht's going on. >> >> The events at nod shutdown were: >> Some Xen PVM was live‑migrated successfully to another node, but during that > there was a message like this: >> Jan 21 10:20:13 h19 virtlockd[41990]: libvirt version: 6.0.0 >> Jan 21 10:20:13 h19 virtlockd[41990]: hostname: h19 >> Jan 21 10:20:13 h19 virtlockd[41990]: resource busy: Lockspace resource > '4c6bebd1f4bc581255b422a65d317f31deef91f777e51ba0daf04419dda7ade5' is not > locked >> Jan 21 10:20:13 h19 libvirtd[41991]: libvirt version: 6.0.0 >> Jan 21 10:20:13 h19 libvirtd[41991]: hostname: h19 >> Jan 21 10:20:13 h19 libvirtd[41991]: resource busy: Lockspace resource > '4c6bebd1f4bc581255b422a65d317f31deef91f777e51ba0daf04419dda7ade5' is not > locked >> Jan 21 10:20:13 h19 libvirtd[41991]: Unable to release lease on test‑jeos4 >> Jan 21 10:20:13 h19 VirtualDomain(prm_xen_test‑jeos4)[32786]: INFO: > test‑jeos4: live migration to h18 succeeded. >> >> Unfortnuately the log message makes it practically impossible to guess what > the locked object actually is (indirect lock using SHA256 as hash it seems). >> >> Then the OCFS for the VM images unmounts successfully while the stop of > lvmlockd is still busy: >> Jan 21 10:20:16 h19 lvmlockd(prm_lvmlockd)[32945]: INFO: stop the lockspaces > of shared VG(s)... >> ... >> Jan 21 10:21:56 h19 pacemaker‑controld[42493]: error: Result of stop > operation for prm_lvmlockd on h19: Timed Out >> >> As said before: I don't have shared VGs any more. I don't understand. >> >> On a node without VMs running I see: >> h19:~ # lvmlockctl ‑d >> 1611221190 lvmlockd started >> 1611221190 No lockspaces found to adopt >> 1611222560 new cl 1 pi 2 fd 8 >> 1611222560 recv client[10817] cl 1 dump_info . "" mode iv flags 0 >> 1611222560 send client[10817] cl 1 dump result 0 dump_len 149 >> 1611222560 send_dump_buf delay 0 total 149 >> 1611222560 close client[10817] cl 1 fd 8 >> 1611222563 new cl 2 pi 2 fd 8 >> 1611222563 recv client[10818] cl 2 dump_log . "" mode iv flags 0 >> >> On a node with VMs running I see: >> h16:~ # lvmlockctl ‑d >> 1611216942 lvmlockd started >> 1611216942 No lockspaces found to adopt >> 1611221684 new cl 1 pi 2 fd 8 >> 1611221684 recv pvs[17159] cl 1 lock gl "" mode sh flags 0 >> 1611221684 lockspace "lvm_global" not found for dlm gl, adding... >> 1611221684 add_lockspace_thread dlm lvm_global version 0 >> 1611221684 S lvm_global lm_add_lockspace dlm wait 0 adopt 0 >> 1611221685 S lvm_global lm_add_lockspace done 0 >> 1611221685 S lvm_global R GLLK action lock sh >> 1611221685 S lvm_global R GLLK res_lock cl 1 mode sh >> 1611221685 S lvm_global R GLLK lock_dlm >> 1611221685 S lvm_global R GLLK res_lock rv 0 read vb 0 0 0 >> 1611221685 S lvm_global R GLLK res_lock all versions zero >> 1611221685 S lvm_global R GLLK res_lock invalidate global state >> 1611221685 send pvs[17159] cl 1 lock gl rv 0 >> 1611221685 recv pvs[17159] cl 1 lock vg "sys" mode sh flags 0 >> 1611221685 lockspace "lvm_sys" not found >> 1611221685 send pvs[17159] cl 1 lock vg rv ‑210 ENOLS >> 1611221685 close pvs[17159] cl 1 fd 8 >> 1611221685 S lvm_global R GLLK res_unlock cl 1 from close >> 1611221685 S lvm_global R GLLK unlock_dlm >> 1611221685 S lvm_global R GLLK res_unlock lm done >> 1611222582 new cl 2 pi 2 fd 8 >> 1611222582 recv client[19210] cl 2 dump_log . "" mode iv flags 0 >> >> Note: "lvm_sys" may refer to VG sys used for the hypervisor. >> >> Regards, >> Ulrich >> >> >> >> _______________________________________________ >> Manage your subscription: >> https://lists.clusterlabs.org/mailman/listinfo/users >> >> ClusterLabs home: https://www.clusterlabs.org/ >> > > _______________________________________________ > Manage your subscription: > https://lists.clusterlabs.org/mailman/listinfo/users > > ClusterLabs home: https://www.clusterlabs.org/
config
Description: Binary data
_______________________________________________ Manage your subscription: https://lists.clusterlabs.org/mailman/listinfo/users ClusterLabs home: https://www.clusterlabs.org/
