Re: [ClusterLabs] Antw: Re: Antw: [EXT] Re: Failed migration causing fencing loop

Ulrich Windl Mon, 04 Apr 2022 00:46:00 -0700

>>> Strahil Nikolov <[email protected]> schrieb am 04.04.2022 um 09:21 in
Nachricht <[email protected]>:
> Do you have a resource for starting up libvirtd and  virtlockd after the 
> OCFS2 ?


Yes:
primitive prm_libvirtd systemd:libvirtd.service ...
primitive prm_lockspace_ocfs2 Filesystem ...
primitive prm_virtlockd systemd:virtlockd ...
clone cln_libvirtd prm_libvirtd ...
clone cln_lockspace_ocfs2 prm_lockspace_ocfs ...
clone cln_virtlockd prm_virtlockd ...
colocation col__libvirtd__virtlockd inf: cln_libvirtd cln_virtlockd
colocation col__virtlockd__lockspace_fs inf: cln_virtlockd cln_lockspace_ocfs2
colocation col__vm__libvirtd inf: ( prm_xen_v01 ... )
order ord__libvirtd__vm Mandatory: cln_libvirtd ( prm_xen_v01 ... )
order ord__lockspace_fs__virtlockd Mandatory: cln_lockspace_ocfs2 cln_virtlockd
order ord__virtlockd__libvirtd Mandatory: cln_virtlockd cln_libvirtd

(some resources were left out, but you get the idea)

Regards,
Ulrich
P.S: Forgot to keep the list in replies...

> 
> Best Regards,Strahil Nikolov
>  
>  
>   On Mon, Apr 4, 2022 at 10:14, Ulrich 
> Windl<[email protected]> wrote:   >>> Strahil Nikolov 
> <[email protected]> schrieb am 04.04.2022 um 08:42 in
> Nachricht <[email protected]>:
>> So,
>> it you use OCFS2 for locking, why the Hypervisor is not responding correctly 
> 
>> to the Virt RA ?
> 
> It seems the VirtualDomain RA required libvirtd to be running, but at the 
> time of startup probes _nothing_ is running.
> That's how I see it.
> 
> pacemaker-controld[7029]:  notice: Result of probe operation for 
> prm_xen_rksapv15 on rksaph18: not running
> ### For whatever reason:
> pacemaker-execd[7021]:  notice: executing - rsc:prm_xen_v15 action:stop 
> call_id:197
> VirtualDomain(prm_xen_v15)[8768]: INFO: Virtual domain v15 currently has no 
> state, retrying.
> VirtualDomain(prm_xen_v15)[8822]: ERROR: Virtual domain v15 has no state 
> during stop operation, bailing out.
> VirtualDomain(prm_xen_v15)[8836]: INFO: Issuing forced shutdown (destroy) 
> request for domain v15.
> VirtualDomain(prm_xen_v15)[8849]: ERROR: forced stop failed
> pacemaker-controld[7029]:  notice: h18-prm_xen_v15_stop_0:197 [ error: 
> failed to connect to the hypervisor error: failed to connect socket to 
> '/var/run/libvirt/libvirt-sock': no such file or directory
> 
> That caused a repeating fencing loop.
> 
> Regards,
> Ulrich
> 
> 
>> Best Regards,Strahil Nikolov
>>  
>>  
>>  On Mon, Apr 4, 2022 at 9:39, Ulrich 
>> Windl<[email protected]> 
>> wrote:  >>> Strahil Nikolov <[email protected]> schrieb am 01.04.2022 um 
>> 15:20 in
>> Nachricht <[email protected]>:
>>> To be honest, I have never had to disable it and as far as I know it's 
>>> clusterwide.
>>> As per my understanding, the cluster checks if the resources are running 
>>> before proceeding further. Of course, I might be wrong and it might not 
>>> help 
> 
>> 
>>> you.
>>> Why don't you setup a shared filesystem for the libvirt's locking ? After 
>>> all your VMs use shared storage .
>> 
>> ??? There is a shared OCFS2 filesystem used for locking, but that 's more a 
>> problem rather than a solution.
>> I wrote: "libvird uses locking (virtlockd), which in turn needs a 
>> cluster-wide filesystem for locks across the nodes."
>> 
>> Regards,
>> Ulrich
>> 
>>> 
>>> Best Regards,Strahil Nikolov
>>>  
>>>  
>>>  On Fri, Apr 1, 2022 at 15:01, Ulrich 
>>> Windl<[email protected]> wrote:  >>> Strahil Nikolov 
>>> <[email protected]> schrieb am 01.04.2022 um 00:45 in
>>> Nachricht <[email protected]>:
>>> 
>>> Hi!
>>> 
>>>> What about if you disable the enable-startup-probes at fencing (custom 
>>>> fencing  that sets it to false and fails, so the next fencing device in 
>>>> the 
>>>> topology kicks in)?
>>> 
>>> Interesting idea, but I never heard of the property before.
>>> However it's cluster-wide, right?
>>> 
>>>> When the node joins , it will skip startup probes and later a systemd 
>>>> service or some script check if all nodes were up for at least 15-20 min 
>>>> and 
> 
>> 
>>> 
>>>> enable it back ?
>>> 
>>> Are there any expected disadvantages?
>>> 
>>> Regards,
>>> Ulrich
>>> 
>>>> Best Regards,Strahil Nikolov
>>>>  
>>>>  
>>>>  On Thu, Mar 31, 2022 at 14:02, Ulrich 
>>>>Windl<[email protected]> wrote:  >>> "Gao,Yan" 
>>>><[email protected]> 
>>>> schrieb am 31.03.2022 um 11:18 in Nachricht
>>>> <[email protected]>:
>>>>> On 2022/3/31 9:03, Ulrich Windl wrote:
>>>>>> Hi!
>>>>>> 
>>>>>> I just wanted to point out one thing that hit us with SLES15 SP3:
>>>>>> Some failed live VM migration causing node fencing resulted in a fencing 
>>>>> loop, because of two reasons:
>>>>>> 
>>>>>> 1) Pacemaker thinks that even _after_ fencing there is some migration to 
>>>>> "clean up". Pacemaker treats the situation as if the VM is running on 
>>>>> both 
>>>>> nodes, thus (50% chance?) trying to stop the VM on the node that just 
>>>>> booted 
> 
>> 
>>> 
>>>> 
>>>>> after fencing. That's supid but shouldn't be fatal IF there weren't...
>>>>>> 
>>>>>> 2) The stop operation of the VM (that atually isn't running) fails,
>>>>> 
>>>>> AFAICT it could not connect to the hypervisor, but the logic in the RA 
>>>>> is kind of arguable that the probe (monitor) of the VM returned "not 
>>>>> running", but the stop right after that returned failure...
>>>>> 
>>>>> OTOH, the point about pacemaker is the stop of the resource on the 
>>>>> fenced and rejoined node is not really necessary. There has been 
>>>>> discussions about this here and we are trying to figure out a solution 
>>>>> for it:
>>>>> 
>>>>> https://github.com/ClusterLabs/pacemaker/pull/2146#discussion_r828204919 
>>>>> 
>>>>> For now it requires administrator's intervene if the situation happens:
>>>>> 1) Fix the access to hypervisor before the fenced node rejoins.
>>>> 
>>>> Thanks for the explanation!
>>>> 
>>>> Unfortunately this can be tricky if libvirtd is involved (as it is here):
>>>> libvird uses locking (virtlockd), which in turn needs a cluster-wird 
>>>> filesystem for locks across the nodes.
>>>> When that filesystem is provided by the cluster, it's hard to delay node 
>>>> joining until filesystem,  virtlockd and libvirtd are running.
>>>> 
>>>> (The issue had been discussed before: It does not make sense to run some 
>>>> probes when those probes need other resources to detect the status.
>>>> With just a Boolean status return at best all those probes could say "not 
>>>> running". Ideally a third status like "please try again some later time"
>>>> would be needed, or probes should follow the dependencies of their 
>>>> resources 
> 
>> 
>>> 
>>>> (which may open another can of worms).
>>>> 
>>>> Regards,
>>>> Ulrich
>>>> 
>>>> 
>>>>> 2) Manually cleanup the resource, which tells pacemaker it can safely 
>>>>> forget the historical migrate_to failure.
>>>>> 
>>>>> Regards,
>>>>>    Yan
>>>>> 
>>>>>> causing a node fence. So the loop is complete.
>>>>>> 
>>>>>> Some details (many unrelated messages left out):
>>>>>> 
>>>>>> Mar 30 16:06:14 h16 libvirtd[13637]: internal error: libxenlight failed 
>>>>>> to 
>>>>> restore domain 'v15'
>>>>>> 
>>>>>> Mar 30 16:06:15 h19 pacemaker-schedulerd[7350]:  warning: Unexpected 
>>>>>> result 
>>>>> (error: v15: live migration to h16 failed: 1) was recorded for migrate_to 
>>>>> of 
> 
>> 
>>> 
>>>> 
>>>>> prm_xen_v15 on h18 at Mar 30 16:06:13 2022
>>>>>> 
>>>>>> Mar 30 16:13:37 h19 pacemaker-schedulerd[7350]:  warning: Unexpected 
>>>>>> result 
>>>>> (OCF_TIMEOUT) was recorded for stop of prm_libvirtd:0 on h18 at Mar 30 
>>>>> 16:13:36 2022
>>>>>> Mar 30 16:13:37 h19 pacemaker-schedulerd[7350]:  warning: Unexpected 
>>>>>> result 
>>>>> (OCF_TIMEOUT) was recorded for stop of prm_libvirtd:0 on h18 at Mar 30 
>>>>> 16:13:36 2022
>>>>>> Mar 30 16:13:37 h19 pacemaker-schedulerd[7350]:  warning: Cluster node 
>>>>>> h18 
>>>>> will be fenced: prm_libvirtd:0 failed there
>>>>>> 
>>>>>> Mar 30 16:19:00 h19 pacemaker-schedulerd[7350]:  warning: Unexpected 
>>>>>> result 
>>>>> (error: v15: live migration to h18 failed: 1) was recorded for migrate_to 
>>>>> of 
> 
>> 
>>> 
>>>> 
>>>>> prm_xen_v15 on h16 at Mar 29 23:58:40 2022
>>>>>> Mar 30 16:19:00 h19 pacemaker-schedulerd[7350]:  error: Resource 
>>>>>> prm_xen_v15 
> 
>> 
>>> 
>>>> 
>>>>> is active on 2 nodes (attempting recovery)
>>>>>> 
>>>>>> Mar 30 16:19:00 h19 pacemaker-schedulerd[7350]:  notice:  * Restart    
>>>>> prm_xen_v15              (            h18 )
>>>>>> 
>>>>>> Mar 30 16:19:04 h18 VirtualDomain(prm_xen_v15)[8768]: INFO: Virtual 
>>>>>> domain 
>>>>> v15 currently has no state, retrying.
>>>>>> Mar 30 16:19:05 h18 VirtualDomain(prm_xen_v15)[8787]: INFO: Virtual 
>>>>>> domain 
>>>>> v15 currently has no state, retrying.
>>>>>> Mar 30 16:19:07 h18 VirtualDomain(prm_xen_v15)[8822]: ERROR: Virtual 
>>>>>> domain 
>>>>> v15 has no state during stop operation, bailing out.
>>>>>> Mar 30 16:19:07 h18 VirtualDomain(prm_xen_v15)[8836]: INFO: Issuing 
>>>>>> forced 
>>>>> shutdown (destroy) request for domain v15.
>>>>>> Mar 30 16:19:07 h18 VirtualDomain(prm_xen_v15)[8860]: ERROR: forced stop 
>>>>> failed
>>>>>> 
>>>>>> Mar 30 16:19:07 h19 pacemaker-controld[7351]:  notice: Transition 124 
>>>>>> action 
> 
>> 
>>> 
>>>> 
>>>>> 115 (prm_xen_v15_stop_0 on h18): expected 'ok' but got 'error'
>>>>>> 
>>>>>> Note: Our cluster nodes start pacemaker during boot. Yesterday I was 
>>>>>> there 
>>>>> when the problem happened. But as we had another boot loop some time ago 
>>>>> I 
>>>>> wrote a systemd service that counts boots, and if too many happen within 
>>>>> a 
>>>>> short time, pacemaker will be disabled on that node. As it it set now, 
>>>>> the 
>>>>> counter is reset if the node is up for at least 15 minutes; if it fails 
>>>>> more 
> 
>> 
>>> 
>>>> 
>>>>> than 4 times to do so, pacemaker will be disabled. If someone wants to 
>>>>> try 
>>>>> that or give feedback, drop me a line, so I could provide the RPM 
>>>>> (boot-loop-handler-0.0.5-0.0.noarch)...
>>>>>> 
>>>>>> Regards,
>>>>>> Ulrich
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> _______________________________________________
>>>>>> Manage your subscription:
>>>>>> https://lists.clusterlabs.org/mailman/listinfo/users 
>>>>>> 
>>>>>> ClusterLabs home: https://www.clusterlabs.org/ 
>>>>>> 
>>>> 
>>>> 
>>>> 
>>>> _______________________________________________
>>>> Manage your subscription:
>>>> https://lists.clusterlabs.org/mailman/listinfo/users 
>>>> 
>>>> ClusterLabs home: https://www.clusterlabs.org/ 
>>>>  
>>> 
>>> 
>>> 
>>>  
>> 
>> 
>> 
>>  
> 
> 
> 
>   



_______________________________________________
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/

Re: [ClusterLabs] Antw: Re: Antw: [EXT] Re: Failed migration causing fencing loop

Reply via email to