Re: [ClusterLabs] Antw: Re: Antw: [EXT] Re: Q: constrain or delay "probes"?

Andrei Borzenkov Mon, 08 Mar 2021 02:47:11 -0800

On 08.03.2021 11:57, Ulrich Windl wrote:
>>>> Reid Wahl <[email protected]> schrieb am 08.03.2021 um 08:42 in Nachricht
> <capiuu9_v0-3k9k-z8+z5u5t8bmh3sl3pzzdolh9g8xcdmfq...@mail.gmail.com>:
>> Did the "active on too many nodes" message happen right after a probe? If
>> so, then it does sound like the probe returned code 0.
> 
> Events were like this (I greatly condensed the logs):
> (DC h16 being stopped)
> Mar 05 09:53:45 h16 pacemaker-schedulerd[7189]:  notice:  * Migrate    
> prm_xen_v09              ( h16 -> h18 )
> Mar 05 09:54:23 h16 pacemaker-controld[7190]:  notice: Initiating migrate_to 
> operation prm_xen_v09_migrate_to_0 locally on h16
> Mar 05 09:54:24 h16 libvirtd[8531]: internal error: Failed to send migration 
> data to destination host
> Mar 05 09:54:24 h16 VirtualDomain(prm_xen_v09)[1834]: ERROR: v09: live 
> migration to h18 failed: 1
> Mar 05 09:54:24 h16 pacemaker-controld[7190]:  notice: Transition 1000 action 
> 125 (prm_xen_v09_migrate_to_0 on h16): expected 'ok' but got 'error'
> Mar 05 09:54:47 h16 pacemaker-schedulerd[7189]:  error: Resource prm_xen_v09 
> is active on 2 nodes (attempting recovery)
> (not really active on two nodes; DC recovers on h18 where v09 probably isn't 
> running, but should stop on h16 first)
> Mar 05 09:54:47 h16 pacemaker-schedulerd[7189]:  notice:  * Recover    
> prm_xen_v09              (             h18 )
> Mar 05 09:54:47 h16 VirtualDomain(prm_xen_v09)[2068]: INFO: Issuing graceful 
> shutdown request for domain v09.
> Mar 05 09:55:12 h16 pacemaker-execd[7187]:  notice: prm_xen_v09 stop (call 
> 297, PID 2035) exited with status 0 (execution time 25101ms, queue time 0ms)
> Mar 05 09:55:12 h16 pacemaker-controld[7190]:  notice: Result of stop 
> operation for prm_xen_v09 on h16: ok
> Mar 05 09:55:14 h16 pacemaker-controld[7190]:  notice: Transition 1001 
> aborted by operation prm_xen_v09_start_0 'modify' on h18: Event failed
> Mar 05 09:55:14 h16 pacemaker-controld[7190]:  notice: Transition 1001 action 
> 117 (prm_xen_v09_start_0 on h18): expected 'ok' but got 'error'
> Mar 05 09:55:15 h16 pacemaker-schedulerd[7189]:  warning: Unexpected result 
> (error: v09: live migration to h18 failed: 1) was recorded for migrate_to of 
> prm_xen_v09 on h16 at Mar  5 09:54:23 2021
> 
> Mar 05 09:55:15 h18 pacemaker-execd[7129]:  notice: prm_xen_v09 stop (call 
> 262, PID 46737) exited with status 0 (execution time 309ms, queue time 0ms)
> 
> (DC shut down)
> Mar 05 09:55:20 h16 pacemakerd[7183]:  notice: Shutdown complete
> Mar 05 09:55:20 h16 systemd[1]: Stopped Corosync Cluster Engine.
> 
> (node starting after being stopped)
> Mar 05 10:38:50 h16 systemd[1]: Starting Shared-storage based fencing 
> daemon...
> Mar 05 10:38:50 h16 systemd[1]: Starting Corosync Cluster Engine...
> Mar 05 10:38:59 h16 pacemaker-controld[14022]:  notice: Quorum acquired
> Mar 05 10:39:00 h16 pacemaker-controld[14022]:  notice: State transition 
> S_PENDING -> S_NOT_DC
> (this probe probably reported nonsense)
> Mar 05 10:39:02 h16 pacemaker-controld[14022]:  notice: Result of probe 
> operation for prm_xen_v09 on h16: ok


So resource agent thinks resource is active.

> (DC noticed)
> Mar 05 10:39:02 h18 pacemaker-controld[7132]:  notice: Transition 5 action 58 
> (prm_xen_v09_monitor_0 on h16): expected 'not running' but got 'ok'
> (from now on probes should be more reliable)
> Mar 05 10:39:07 h16 systemd[1]: Started Virtualization daemon.
> (there is nothing to stop)
> Mar 05 10:39:09 h16 pacemaker-execd[14019]:  notice: executing - 
> rsc:prm_xen_v09 action:stop call_id:166
> (obviously)
> Mar 05 10:40:11 h16 libvirtd[15490]: internal error: Failed to shutdown 
> domain '20' with libxenlight
> (more nonsense)
> Mar 05 10:44:04 h16 VirtualDomain(prm_xen_v09)[17306]: INFO: Issuing forced 
> shutdown (destroy) request for domain v09.
> (eventually)
> Mar 05 10:44:07 h16 pacemaker-controld[14022]:  notice: Result of stop 
> operation for prm_xen_v09 on h16: ok
> Mar 05 10:44:07 h16 pacemaker-execd[14019]:  notice: executing - 
> rsc:prm_xen_v09 action:start call_id:168
> 
>>
>> If a probe returned 0 and it **shouldn't** have done so, then either the
>> monitor operation needs to be redesigned, or resource-discovery=never (or
>> resource-discovery=exclusive) can be used to prevent the probe from
>> happening where it should not.
> 
> Well, the situation here is using virtlockd with indirect locking in a 
> cluster when the cluster provided the shared filesystem used for locking.
> 
> Then the obvious ordering is:
> 1) Provide shared filesystem (mount it)
> 2) start virtlockd (to put the lock files in a shared place)
> 3) run libvirtd (using virtlockd)
> 4) Manage VMs using libvirt
> 
> Unfortunately probes (expecting to use libvirt) are being run even before 1), 
> and I don't know why they return success then.

That is what you need to investigate.

Probe needs to answer "is resource active *now*". If probe for resource
is impossible until some other resources are active, something is really
wrong with design. Either resource is active or not.

> (Other VMs were probed as "not running")
> 
>>
>> If a probe returned 0 and it **should** have done so, but the stop
>> operation on the other node wasn't reflected in the CIB (so that the
>> resource still appeared to be active there), then that's odd.
> 
> Well, when reviewing the logs, the cluster may actually have v09 running on 
> h16 even though the node was stopped.
> So the problem was on stopping, not starting, but still I doubt the probe at 
> that time is quite reliable.
> 
>>
>> A bug is certainly possible, though we can't say without more detail :)
> 
> I see what you mean.
> 
> Regards,
> Ulrich
> 
>>
>> On Sun, Mar 7, 2021 at 11:10 PM Ulrich Windl <
>> [email protected]> wrote:
>>
>>>>>> Reid Wahl <[email protected]> schrieb am 05.03.2021 um 21:22 in
>>> Nachricht
>>> <capiuu991o08dnavkm9bc8n9bk-+nh9e0_f25o6ddis5wzwg...@mail.gmail.com>:
>>>> On Fri, Mar 5, 2021 at 10:13 AM Ken Gaillot <[email protected]> wrote:
>>>>
>>>>> On Fri, 2021-03-05 at 11:39 +0100, Ulrich Windl wrote:
>>>>>> Hi!
>>>>>>
>>>>>> I'm unsure what actually causes a problem I see (a resource was
>>>>>> "detected running" when it actually was not), but I'm sure some probe
>>>>>> started on cluster node start cannot provide a useful result until
>>>>>> some other resource has been started. AFAIK there is no way to make a
>>>>>> probe obey odering or colocation constraints, so the only work-around
>>>>>> seems to be a delay. However I'm unsure whether probes can actually
>>>>>> be delayed.
>>>>>>
>>>>>> Ideas?
>>>>>
>>>>> Ordered probes are a thorny problem that we've never been able to come
>>>>> up with a general solution for. We do order certain probes where we
>>>>> have enough information to know it's safe. The problem is that it is
>>>>> very easy to introduce ordering loops.
>>>>>
>>>>> I don't remember if there any workarounds.
>>>>>
>>>>
>>>> Maybe as a workaround:
>>>>   - Add an ocf:pacemaker:attribute resource after-and-with rsc1
>>>>   - Then configure a location rule for rsc2 with resource-discovery=never
>>>> and score=-INFINITY with expression (in pseudocode) "attribute is not set
>>>> to active value"
>>>>
>>>> I haven't tested but that might cause rsc2's probe to wait until rsc1 is
>>>> active.
>>>>
>>>> And of course, use the usual constraints/rules to ensure rsc2's probe
>>> only
>>>> runs on rsc1's node.
>>>>
>>>>
>>>>>> Despite of that I wonder whether some probe/monitor returncode like
>>>>>> OCF_NOT_READY would make sense if the operation detects that it
>>>>>> cannot return a current status (so both "running" and "stopped" would
>>>>>> be as inadequate as "starting" and "stopping" would be (despite of
>>>>>> the fact that the latter two do not exist)).
>>>>>
>>>>
>>>> This seems logically reasonable, independent of any implementation
>>>> complexity and considerations of what we would do with that return code.
>>>
>>> Thanks for the proposal!
>>> The actual problem I was facing was that the cluster claimed some resource
>>> would be running on two nodes at the same time, when actually one node had
>>> been stopped properly (with all the resources). The bad state in the CIB
>>> was most likely due to a software bug in pacemaker, but probes on
>>> re-starting the node seemed not to prevent pacemaker from doing a really
>>> wrong "recovery action".
>>> My hope was that probes might update the CIB before some stupid action is
>>> being dopne. Maybe it's just another software bug...
>>>
>>> Regards,
>>> Ulrich
>>>
>>>>
>>>>
>>>>>> Regards,
>>>>>> Ulrich
>>>>> --
>>>>> Ken Gaillot <[email protected]>
>>>>>
>>>>> _______________________________________________
>>>>> Manage your subscription:
>>>>> https://lists.clusterlabs.org/mailman/listinfo/users 
>>>>>
>>>>> ClusterLabs home: https://www.clusterlabs.org/ 
>>>>>
>>>>>
>>>>
>>>> --
>>>> Regards,
>>>>
>>>> Reid Wahl, RHCA
>>>> Senior Software Maintenance Engineer, Red Hat
>>>> CEE - Platform Support Delivery - ClusterHA
>>>
>>>
>>>
>>>
>>> _______________________________________________
>>> Manage your subscription:
>>> https://lists.clusterlabs.org/mailman/listinfo/users 
>>>
>>> ClusterLabs home: https://www.clusterlabs.org/ 
>>>
>>>
>>
>> -- 
>> Regards,
>>
>> Reid Wahl, RHCA
>> Senior Software Maintenance Engineer, Red Hat
>> CEE - Platform Support Delivery - ClusterHA
> 
> 
> 
> _______________________________________________
> Manage your subscription:
> https://lists.clusterlabs.org/mailman/listinfo/users
> 
> ClusterLabs home: https://www.clusterlabs.org/
> 

_______________________________________________
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/

Re: [ClusterLabs] Antw: Re: Antw: [EXT] Re: Q: constrain or delay "probes"?

Reply via email to