Hi, Antony. failure-timeout should be a resource meta attribute, not an attribute of the monitor operation. At least I'm not aware of it being configurable per-operation -- maybe it is. Can't check at the moment :)
On Wednesday, March 31, 2021, Antony Stone <antony.st...@ha.open.source.it> wrote: > Hi. > > I've pared my configureation down to almost a bare minimum to demonstrate the > problem I'm having. > > I have two questions: > > 1. What command can I use to find out what pacemaker thinks my cluster.cib file > really means? > > I know what I put in it, but I want to see what pacemaker has understood from > it, to make sure that pacemaker has the same idea about how to manage my > resources as I do. > > > 2. Can anyone tell me what the problem is with the following cluster.cib > (lines split on spaces to make things more readable, the actual file consists > of four lines of text): > > primitive IP-float4 > IPaddr2 > params > ip=10.1.0.5 > cidr_netmask=24 > meta > migration-threshold=3 > op > monitor > interval=10 > timeout=30 > on-fail=restart > failure-timeout=180 > primitive IPsecVPN > lsb:ipsecwrapper > meta > migration-threshold=3 > op > monitor > interval=10 > timeout=30 > on-fail=restart > failure-timeout=180 > group Everything > IP-float4 > IPsecVPN > resource-stickiness=100 > property cib-bootstrap-options: > stonith-enabled=no > no-quorum-policy=stop > start-failure-is-fatal=false > cluster-recheck-interval=60s > > My problem is that "failure-timeout" is not being honoured. A resource > failure simply never times out, and 3 failures (over a fortnight, if that's > how long it takes to get 3 failures) mean that the resources move. > > I want a failure to be forgotten about after 180 seconds (or at least, soon > after that - 240 seconds would be fine, if cluster-recheck-interval means that > 180 can't quite be achieved). > > Somehow or other, _far_ more than 180 seconds go by, and I *still* have: > > fail-count=1 last-failure='Wed Mar 31 21:23:11 2021' > > as part of the output of "crm status -f" (the above timestamp is BST, so > that's 70 minutes ago now). > > > Thanks for any help, > > > Antony. > > -- > Don't procrastinate - put it off until tomorrow. > > Please reply to the list; > please *don't* CC me. > _______________________________________________ > Manage your subscription: > https://lists.clusterlabs.org/mailman/listinfo/users > > ClusterLabs home: https://www.clusterlabs.org/ > > -- Regards, Reid Wahl, RHCA Senior Software Maintenance Engineer, Red Hat CEE - Platform Support Delivery - ClusterHA
_______________________________________________ Manage your subscription: https://lists.clusterlabs.org/mailman/listinfo/users ClusterLabs home: https://www.clusterlabs.org/