On 08.03.2021 11:57, Ulrich Windl wrote: >>>> Reid Wahl <[email protected]> schrieb am 08.03.2021 um 08:42 in Nachricht > <capiuu9_v0-3k9k-z8+z5u5t8bmh3sl3pzzdolh9g8xcdmfq...@mail.gmail.com>: >> Did the "active on too many nodes" message happen right after a probe? If >> so, then it does sound like the probe returned code 0. > > Events were like this (I greatly condensed the logs): > (DC h16 being stopped) > Mar 05 09:53:45 h16 pacemaker-schedulerd[7189]: notice: * Migrate > prm_xen_v09 ( h16 -> h18 ) > Mar 05 09:54:23 h16 pacemaker-controld[7190]: notice: Initiating migrate_to > operation prm_xen_v09_migrate_to_0 locally on h16 > Mar 05 09:54:24 h16 libvirtd[8531]: internal error: Failed to send migration > data to destination host > Mar 05 09:54:24 h16 VirtualDomain(prm_xen_v09)[1834]: ERROR: v09: live > migration to h18 failed: 1 > Mar 05 09:54:24 h16 pacemaker-controld[7190]: notice: Transition 1000 action > 125 (prm_xen_v09_migrate_to_0 on h16): expected 'ok' but got 'error' > Mar 05 09:54:47 h16 pacemaker-schedulerd[7189]: error: Resource prm_xen_v09 > is active on 2 nodes (attempting recovery) > (not really active on two nodes; DC recovers on h18 where v09 probably isn't > running, but should stop on h16 first) > Mar 05 09:54:47 h16 pacemaker-schedulerd[7189]: notice: * Recover > prm_xen_v09 ( h18 ) > Mar 05 09:54:47 h16 VirtualDomain(prm_xen_v09)[2068]: INFO: Issuing graceful > shutdown request for domain v09. > Mar 05 09:55:12 h16 pacemaker-execd[7187]: notice: prm_xen_v09 stop (call > 297, PID 2035) exited with status 0 (execution time 25101ms, queue time 0ms) > Mar 05 09:55:12 h16 pacemaker-controld[7190]: notice: Result of stop > operation for prm_xen_v09 on h16: ok > Mar 05 09:55:14 h16 pacemaker-controld[7190]: notice: Transition 1001 > aborted by operation prm_xen_v09_start_0 'modify' on h18: Event failed > Mar 05 09:55:14 h16 pacemaker-controld[7190]: notice: Transition 1001 action > 117 (prm_xen_v09_start_0 on h18): expected 'ok' but got 'error' > Mar 05 09:55:15 h16 pacemaker-schedulerd[7189]: warning: Unexpected result > (error: v09: live migration to h18 failed: 1) was recorded for migrate_to of > prm_xen_v09 on h16 at Mar 5 09:54:23 2021 > > Mar 05 09:55:15 h18 pacemaker-execd[7129]: notice: prm_xen_v09 stop (call > 262, PID 46737) exited with status 0 (execution time 309ms, queue time 0ms) > > (DC shut down) > Mar 05 09:55:20 h16 pacemakerd[7183]: notice: Shutdown complete > Mar 05 09:55:20 h16 systemd[1]: Stopped Corosync Cluster Engine. > > (node starting after being stopped) > Mar 05 10:38:50 h16 systemd[1]: Starting Shared-storage based fencing > daemon... > Mar 05 10:38:50 h16 systemd[1]: Starting Corosync Cluster Engine... > Mar 05 10:38:59 h16 pacemaker-controld[14022]: notice: Quorum acquired > Mar 05 10:39:00 h16 pacemaker-controld[14022]: notice: State transition > S_PENDING -> S_NOT_DC > (this probe probably reported nonsense) > Mar 05 10:39:02 h16 pacemaker-controld[14022]: notice: Result of probe > operation for prm_xen_v09 on h16: ok
So resource agent thinks resource is active. > (DC noticed) > Mar 05 10:39:02 h18 pacemaker-controld[7132]: notice: Transition 5 action 58 > (prm_xen_v09_monitor_0 on h16): expected 'not running' but got 'ok' > (from now on probes should be more reliable) > Mar 05 10:39:07 h16 systemd[1]: Started Virtualization daemon. > (there is nothing to stop) > Mar 05 10:39:09 h16 pacemaker-execd[14019]: notice: executing - > rsc:prm_xen_v09 action:stop call_id:166 > (obviously) > Mar 05 10:40:11 h16 libvirtd[15490]: internal error: Failed to shutdown > domain '20' with libxenlight > (more nonsense) > Mar 05 10:44:04 h16 VirtualDomain(prm_xen_v09)[17306]: INFO: Issuing forced > shutdown (destroy) request for domain v09. > (eventually) > Mar 05 10:44:07 h16 pacemaker-controld[14022]: notice: Result of stop > operation for prm_xen_v09 on h16: ok > Mar 05 10:44:07 h16 pacemaker-execd[14019]: notice: executing - > rsc:prm_xen_v09 action:start call_id:168 > >> >> If a probe returned 0 and it **shouldn't** have done so, then either the >> monitor operation needs to be redesigned, or resource-discovery=never (or >> resource-discovery=exclusive) can be used to prevent the probe from >> happening where it should not. > > Well, the situation here is using virtlockd with indirect locking in a > cluster when the cluster provided the shared filesystem used for locking. > > Then the obvious ordering is: > 1) Provide shared filesystem (mount it) > 2) start virtlockd (to put the lock files in a shared place) > 3) run libvirtd (using virtlockd) > 4) Manage VMs using libvirt > > Unfortunately probes (expecting to use libvirt) are being run even before 1), > and I don't know why they return success then. That is what you need to investigate. Probe needs to answer "is resource active *now*". If probe for resource is impossible until some other resources are active, something is really wrong with design. Either resource is active or not. > (Other VMs were probed as "not running") > >> >> If a probe returned 0 and it **should** have done so, but the stop >> operation on the other node wasn't reflected in the CIB (so that the >> resource still appeared to be active there), then that's odd. > > Well, when reviewing the logs, the cluster may actually have v09 running on > h16 even though the node was stopped. > So the problem was on stopping, not starting, but still I doubt the probe at > that time is quite reliable. > >> >> A bug is certainly possible, though we can't say without more detail :) > > I see what you mean. > > Regards, > Ulrich > >> >> On Sun, Mar 7, 2021 at 11:10 PM Ulrich Windl < >> [email protected]> wrote: >> >>>>>> Reid Wahl <[email protected]> schrieb am 05.03.2021 um 21:22 in >>> Nachricht >>> <capiuu991o08dnavkm9bc8n9bk-+nh9e0_f25o6ddis5wzwg...@mail.gmail.com>: >>>> On Fri, Mar 5, 2021 at 10:13 AM Ken Gaillot <[email protected]> wrote: >>>> >>>>> On Fri, 2021-03-05 at 11:39 +0100, Ulrich Windl wrote: >>>>>> Hi! >>>>>> >>>>>> I'm unsure what actually causes a problem I see (a resource was >>>>>> "detected running" when it actually was not), but I'm sure some probe >>>>>> started on cluster node start cannot provide a useful result until >>>>>> some other resource has been started. AFAIK there is no way to make a >>>>>> probe obey odering or colocation constraints, so the only work-around >>>>>> seems to be a delay. However I'm unsure whether probes can actually >>>>>> be delayed. >>>>>> >>>>>> Ideas? >>>>> >>>>> Ordered probes are a thorny problem that we've never been able to come >>>>> up with a general solution for. We do order certain probes where we >>>>> have enough information to know it's safe. The problem is that it is >>>>> very easy to introduce ordering loops. >>>>> >>>>> I don't remember if there any workarounds. >>>>> >>>> >>>> Maybe as a workaround: >>>> - Add an ocf:pacemaker:attribute resource after-and-with rsc1 >>>> - Then configure a location rule for rsc2 with resource-discovery=never >>>> and score=-INFINITY with expression (in pseudocode) "attribute is not set >>>> to active value" >>>> >>>> I haven't tested but that might cause rsc2's probe to wait until rsc1 is >>>> active. >>>> >>>> And of course, use the usual constraints/rules to ensure rsc2's probe >>> only >>>> runs on rsc1's node. >>>> >>>> >>>>>> Despite of that I wonder whether some probe/monitor returncode like >>>>>> OCF_NOT_READY would make sense if the operation detects that it >>>>>> cannot return a current status (so both "running" and "stopped" would >>>>>> be as inadequate as "starting" and "stopping" would be (despite of >>>>>> the fact that the latter two do not exist)). >>>>> >>>> >>>> This seems logically reasonable, independent of any implementation >>>> complexity and considerations of what we would do with that return code. >>> >>> Thanks for the proposal! >>> The actual problem I was facing was that the cluster claimed some resource >>> would be running on two nodes at the same time, when actually one node had >>> been stopped properly (with all the resources). The bad state in the CIB >>> was most likely due to a software bug in pacemaker, but probes on >>> re-starting the node seemed not to prevent pacemaker from doing a really >>> wrong "recovery action". >>> My hope was that probes might update the CIB before some stupid action is >>> being dopne. Maybe it's just another software bug... >>> >>> Regards, >>> Ulrich >>> >>>> >>>> >>>>>> Regards, >>>>>> Ulrich >>>>> -- >>>>> Ken Gaillot <[email protected]> >>>>> >>>>> _______________________________________________ >>>>> Manage your subscription: >>>>> https://lists.clusterlabs.org/mailman/listinfo/users >>>>> >>>>> ClusterLabs home: https://www.clusterlabs.org/ >>>>> >>>>> >>>> >>>> -- >>>> Regards, >>>> >>>> Reid Wahl, RHCA >>>> Senior Software Maintenance Engineer, Red Hat >>>> CEE - Platform Support Delivery - ClusterHA >>> >>> >>> >>> >>> _______________________________________________ >>> Manage your subscription: >>> https://lists.clusterlabs.org/mailman/listinfo/users >>> >>> ClusterLabs home: https://www.clusterlabs.org/ >>> >>> >> >> -- >> Regards, >> >> Reid Wahl, RHCA >> Senior Software Maintenance Engineer, Red Hat >> CEE - Platform Support Delivery - ClusterHA > > > > _______________________________________________ > Manage your subscription: > https://lists.clusterlabs.org/mailman/listinfo/users > > ClusterLabs home: https://www.clusterlabs.org/ > _______________________________________________ Manage your subscription: https://lists.clusterlabs.org/mailman/listinfo/users ClusterLabs home: https://www.clusterlabs.org/
