On Mon, 2021-01-18 at 13:01 +0000, Strahil Nikolov wrote: > Have you tried on-fail=ignore option ? > > Best Regards, > Strahil Nikolov
on-fail=ignore will act as if the operation succeeded, which probably isn't desired here. It's usually used for flaky/buggy devices/agents that sometimes (or always) report failure for successful starts or monitors, or for noncritical resources where a monitor failure is interesting (in status displays) but it's not worth doing anything about. on-fail=block does make more sense (essentially it means "wait for a human to look into it"). Also I'm not sure whether on-fail=ignore is allowed for stop. > > > > > > В неделя, 17 януари 2021 г., 20:45:27 Гринуич+2, Digimer < > [email protected]> написа: > > > > > > Hi all, > > I'm trying to figure out how to define a resource such that if it > fails in any way, it will not cause pacemaker self self-fence. The > reasoning being that there are relatively minor ways to fault a > single > resource (these are VMs, so for example, a bad edit to the XML > definition renders it invalid, or the definition is accidentally > removed). > > In a case like this, I fully expect that resource to enter a failed > state. Of course, pacemaker won't be able to stop it, migrate it, > etc. > When this happens currently, it causes the host to self-fence, taking > down all other hosted resources (servers). This is less than ideal. > > Is there a way to tell pacemaker that if it's unable to manage a > resource, it flags it as failed and leaves it at that? I've been > trying > to do this and my config so far is; > > pcs resource create srv07-el6 ocf:alteeve:server name="srv07-el6" \ > meta allow-migrate="true" target-role="stopped" \ > op monitor interval="60" start timeout="INFINITY" \ > on-fail="block" stop timeout="INFINITY" on-fail="block" \ > migrate_to timeout="INFINITY" > > This is getting cumbersome and still, in testing, I'm finding cases > where the node gets fenced when something breaks the resource in a > creative way. I'd expect the above to work. As discussed in the other thread, one case where it can't work is when it's not there. :) If you've found some other way where it doesn't work as expected, let me know. (Of course, there's also the separate possibility of node failure, manual or DLM-initiated fencing, etc. but I'm sure you're familiar with all that.) > > Thanks for any insight/guidance! > -- Ken Gaillot <[email protected]> _______________________________________________ Manage your subscription: https://lists.clusterlabs.org/mailman/listinfo/users ClusterLabs home: https://www.clusterlabs.org/
