[ClusterLabs] Resource failure-timeout does not reset when resource fails to connect to both nodes

Sam Gardner Mon, 28 Mar 2016 09:48:42 -0700

I have a simple resource defined:

[root@ha-d1 ~]# pcs resource show dmz1
 Resource: dmz1 (class=ocf provider=internal type=ip-address)
  Attributes: address=172.16.10.192 monitor_link=true
  Meta Attrs: migration-threshold=3 failure-timeout=30s
  Operations: monitor interval=7s (dmz1-monitor-interval-7s)


This is a custom resource which provides an ethernet alias to one of the 
interfaces on our system.

I can unplug the cable on either node and failover occurs as expected, and 30s 
after re-plugging it I can repeat the exercise on the opposite node and 
failover will happen as expected.

However, if I unplug the cable from both nodes, the failcount goes up, and the 
30s failure-timeout does not reset the failcounts, meaning that pacemaker never 
tries to start the failed resource again.

Full list of resources:

 Resource Group: network
     inif       (off::internal:ip.sh):       Started ha-d1.dev.com
     outif      (off::internal:ip.sh):       Started ha-d2.dev.com
     dmz1       (off::internal:ip.sh):       Stopped
 Master/Slave Set: DRBDMaster [DRBDSlave]
     Masters: [ ha-d1.dev.com ]
     Slaves: [ ha-d2.dev.com ]
 Resource Group: filesystem
     DRBDFS     (ocf::heartbeat:Filesystem):    Stopped
 Resource Group: application
     service_failover   (off::internal:service_failover):    Stopped

Failcounts for dmz1
 ha-d1.dev.com: 4
 ha-d2.dev.com: 4

Is there any way to automatically recover from this scenario, other than 
setting an obnoxiously high migration-threshold?

--
Sam Gardner
Software Engineer
Trustwave | SMART SECURITY ON DEMAND

________________________________

This transmission may contain information that is privileged, confidential, 
and/or exempt from disclosure under applicable law. If you are not the intended 
recipient, you are hereby notified that any disclosure, copying, distribution, 
or use of the information contained herein (including any reliance thereon) is 
strictly prohibited. If you received this transmission in error, please 
immediately contact the sender and destroy the material in its entirety, 
whether in electronic or hard copy format.

_______________________________________________
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

[ClusterLabs] Resource failure-timeout does not reset when resource fails to connect to both nodes

Reply via email to