[ClusterLabs] Pacemaker resource start delay when there are another resource is starting

lkxjtu Fri, 27 Oct 2017 09:05:57 -0700

I have two clone resources in my corosync/pacemaker cluster. They are fm_mgt 
and logserver. Both of their RA is ocf. fm_mgt takes 1 minute to start the


service(calling ocf start function for 1 minite). Configured as below：

# crm configure show
node 168002177: 192.168.2.177
node 168002178: 192.168.2.178
node 168002179: 192.168.2.179
primitive fm_mgt fm_mgt \
        op monitor interval=20s timeout=120s \
        op stop interval=0 timeout=120s on-fail=restart \
        op start interval=0 timeout=120s on-fail=restart \
        meta target-role=Started
primitive logserver logserver \
        op monitor interval=20s timeout=120s \
        op stop interval=0 timeout=120s on-fail=restart \
        op start interval=0 timeout=120s on-fail=restart \
        meta target-role=Started
clone fm_mgt_replica fm_mgt
clone logserver_replica logserver
property cib-bootstrap-options: \
        have-watchdog=false \
        dc-version=1.1.13-10.el7-44eb2dd \
        cluster-infrastructure=corosync \
        stonith-enabled=false \
        start-failure-is-fatal=false

When I kill fm_mgt service on one node，pacemaker will immediately recover it 
after monitor failed. This looks perfectly normal. But in this 1 minite

of fm_mgt starting, if I kill logserver service on any node, the monitor will 
catch the fail normally too，but pacemaker will not restart it

immediately but waiting for fm_mgt starting finished. After fm_mgt starting 
finished, pacemaker begin restarting logserver. It seems that there are

some dependency between pacemaker resource.

# crm status
Last updated: Thu Oct 26 06:40:24 2017          Last change: Thu Oct 26     
06:36:33 2017 by root via crm_resource on 192.168.2.177
Stack: corosync
Current DC: 192.168.2.179 (version 1.1.13-10.el7-44eb2dd) - partition with 
quorum
3 nodes and 6 resources configured
Online: [ 192.168.2.177 192.168.2.178 192.168.2.179 ]
Full list of resources:
 Clone Set: logserver_replica [logserver]
     logserver  (ocf::heartbeat:logserver):     FAILED 192.168.2.177
     Started: [ 192.168.2.178 192.168.2.179 ]
 Clone Set: fm_mgt_replica [fm_mgt]
     Started: [ 192.168.2.178 192.168.2.179 ]
     Stopped: [ 192.168.2.177 ]

I am confusing very much. Is there something wrong configure?Thank you very 
much!

James

best regards

_______________________________________________
Users mailing list: Users@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

[ClusterLabs] Pacemaker resource start delay when there are another resource is starting

Reply via email to