On 08/18/2016 03:17 AM, TEG AMJG wrote: > Hi > > I am having a problem with a simple Active/Passive cluster which > consists in the next configuration > > Cluster Name: kamcluster > Corosync Nodes: > kam1vs3 kam2vs3 > Pacemaker Nodes: > kam1vs3 kam2vs3 > > Resources: > Resource: ClusterIP (class=ocf provider=heartbeat type=IPaddr2) > Attributes: ip=10.0.1.206 cidr_netmask=32 > Operations: start interval=0s timeout=20s (ClusterIP-start-interval-0s) > stop interval=0s timeout=20s (ClusterIP-stop-interval-0s) > monitor interval=10s (ClusterIP-monitor-interval-10s) > Resource: ClusterIP2 (class=ocf provider=heartbeat type=IPaddr2) > Attributes: ip=10.0.1.207 cidr_netmask=32 > Operations: start interval=0s timeout=20s (ClusterIP2-start-interval-0s) > stop interval=0s timeout=20s (ClusterIP2-stop-interval-0s) > monitor interval=10s (ClusterIP2-monitor-interval-10s) > Resource: rtpproxycluster (class=systemd type=rtpproxy) > Operations: monitor interval=10s (rtpproxycluster-monitor-interval-10s) > stop interval=0s on-fail=block > (rtpproxycluster-stop-interval-0s) > Resource: kamailioetcfs (class=ocf provider=heartbeat type=Filesystem) > Attributes: device=/dev/drbd1 directory=/etc/kamailio fstype=ext4 > Operations: start interval=0s timeout=60 > (kamailioetcfs-start-interval-0s) > monitor interval=10s on-fail=fence > (kamailioetcfs-monitor-interval-1 > 0s) > stop interval=0s on-fail=fence > (kamailioetcfs-stop-interval-0s) > Clone: fence_kam2_xvm-clone > Meta Attrs: interleave=true clone-max=2 clone-node-max=1 > Resource: fence_kam2_xvm (class=stonith type=fence_xvm) > Attributes: port=tegamjg_kam2 pcmk_host_list=kam2vs3 > Operations: monitor interval=60s (fence_kam2_xvm-monitor-interval-60s) > Master: kamailioetcclone > Meta Attrs: master-max=1 master-node-max=1 clone-max=2 > clone-node-max=1 notify=t > rue on-fail=fence > Resource: kamailioetc (class=ocf provider=linbit type=drbd) > Attributes: drbd_resource=kamailioetc > Operations: start interval=0s timeout=240 > (kamailioetc-start-interval-0s) > promote interval=0s on-fail=fence > (kamailioetc-promote-interval-0s) > demote interval=0s on-fail=fence > (kamailioetc-demote-interval-0s) > stop interval=0s on-fail=fence > (kamailioetc-stop-interval-0s) > monitor interval=10s (kamailioetc-monitor-interval-10s) > Clone: fence_kam1_xvm-clone > Meta Attrs: interleave=true clone-max=2 clone-node-max=1 > Resource: fence_kam1_xvm (class=stonith type=fence_xvm) > Attributes: port=tegamjg_kam1 pcmk_host_list=kam1vs3 > Operations: monitor interval=60s (fence_kam1_xvm-monitor-interval-60s) > Resource: kamailiocluster (class=ocf provider=heartbeat type=kamailio) > Attributes: listen_address=10.0.1.206 > conffile=/etc/kamailio/kamailio.cfg pidfil > > e=/var/run/kamailio.pid monitoring_ip=10.0.1.206 > monitoring_ip2=10.0.1.207 port=50 > 60 proto=udp > kamctlrc=/etc/kamailio/kamctlrc shmem=128 pkg=8 > Meta Attrs: target-role=Stopped > Operations: start interval=0s timeout=60 > (kamailiocluster-start-interval-0s) > stop interval=0s timeout=30 > (kamailiocluster-stop-interval-0s) > monitor interval=5s (kamailiocluster-monitor-interval-5s) > > Stonith Devices: > Fencing Levels: > > Location Constraints: > Ordering Constraints: > start fence_kam1_xvm-clone then start fence_kam2_xvm-clone > (kind:Mandatory) (id: > > order-fence_kam1_xvm-clone-fence_kam2_xvm-clone-mandatory) > start fence_kam2_xvm-clone then promote kamailioetcclone > (kind:Mandatory) (id:or > > der-fence_kam2_xvm-clone-kamailioetcclone-mandatory) > promote kamailioetcclone then start kamailioetcfs (kind:Optional) > (id:order-kama > ilioetcclone-kamailioetcfs-Optional) > Resource Sets: > set kamailioetcfs sequential=true (id:pcs_rsc_set_kamailioetcfs) > set ClusterIP > ClusterIP2 sequential=false > (id:pcs_rsc_set_ClusterIP_ClusterIP2) set rtpproxyclu > > ster kamailiocluster sequential=true > (id:pcs_rsc_set_rtpproxycluster_kamailioclust > er) > (id:pcs_rsc_order_set_kamailioetcfs_set_ClusterIP_ClusterIP2_set_rtpproxyclust > > er_kamailiocluster) > Colocation Constraints: > rtpproxycluster with ClusterIP2 (score:INFINITY) > (id:colocation-rtpproxycluster- > ClusterIP2-INFINITY) > ClusterIP2 with ClusterIP (score:INFINITY) > (id:colocation-ClusterIP2-ClusterIP-I > NFINITY) > ClusterIP with kamailioetcfs (score:INFINITY) > (id:colocation-ClusterIP-kamailioe > tcfs-INFINITY) > kamailioetcfs with kamailioetcclone (score:INFINITY) > (with-rsc-role:Master) (id: > > colocation-kamailioetcfs-kamailioetcclone-INFINITY) > fence_kam2_xvm-clone with fence_kam1_xvm-clone (score:INFINITY) > (id:colocation-f > > ence_kam2_xvm-clone-fence_kam1_xvm-clone-INFINITY) > kamailioetcclone with fence_kam2_xvm-clone (score:INFINITY) > (id:colocation-kamai > > lioetcclone-fence_kam2_xvm-clone-INFINITY) > kamailiocluster with rtpproxycluster (score:INFINITY) > (id:colocation-kamailioclu > ster-rtpproxycluster-INFINITY) > > Resources Defaults: > migration-threshold: 2 > failure-timeout: 10m > resource-stickiness: 200 > Operations Defaults: > No defaults set > > Cluster Properties: > cluster-infrastructure: corosync > cluster-name: kamcluster > dc-version: 1.1.13-10.el7_2.2-44eb2dd > have-watchdog: false > last-lrm-refresh: 1471479940 > no-quorum-policy: ignore > start-failure-is-fatal: false > stonith-action: reboot > stonith-enabled: true
Your pacemaker-version is kind of old. Somewhere in the back of my mind I think I remember similar delays (a subsequent monitor taking it out of a bad situation) when doing investigations for https://github.com/ClusterLabs/pacemaker/commit/5e804ec3eab232d96b2ede264bb9286294ae33f4 Maybe you could give it a try with something newer or at least that commit added. It is just one additional check and doing the systemd-reload every time when requested and not just every 10th time. Anyway - might be related to the systemd-reloads needed when the override-files are written/deleted. It might either be systemd-waiting for some other reason to find the changes or the other way round the reload just taking that long... > > Now my problem is when the rtpproxy systemd resource starts/stops, i > am experiencing 2 seconds delay between the call from the start/stop > signal is sent to systemd and the confirmation and interpretation of > the action to the crmd. I am pretty sure that the initiation of this > service doesnt take even half of that time to start/stop. i did the > same thing with a dummy service as a systemd resource and it als > takes 2 seconds to start/stop the resource. Also, this is not > happening in any of the ocf resources. I am putting what i see in the > logs about this behaviour > > This is after doing "pcs resource restart rtpproxycluster" > > Aug 17 20:59:18 [13187] kam1 crmd: notice: te_rsc_command: > Initiating action 14: stop rtpproxycluster_stop_0 on kam1vs3 (local) > Aug 17 20:59:18 [13184] kam1 lrmd: info: > cancel_recurring_action: Cancelling systemd operation > rtpproxycluster_status_10000 > Aug 17 20:59:18 [13187] kam1 crmd: info: do_lrm_rsc_op: > Performing key=14:30:0:8a202722-ece2-4617-b26e-8d4aa5f3522b > op=rtpproxycluster_stop_0 > Aug 17 20:59:18 [13184] kam1 lrmd: info: log_execute: > executing - rsc:rtpproxycluster action:stop call_id:106 > Aug 17 20:59:18 [13187] kam1 crmd: info: process_lrm_event: > Operation rtpproxycluster_monitor_10000: Cancelled (node=kam1vs3, > call=104, confirmed=true) > Aug 17 20:59:18 [13184] kam1 lrmd: info: > systemd_exec_result: Call to stop passed: > /org/freedesktop/systemd1/job/8302 > Aug 17 20:59:20 [13187] kam1 crmd: notice: process_lrm_event: > Operation rtpproxycluster_stop_0: ok (node=kam1vs3, call=106, rc=0, > cib-update=134, confirmed=true) > Aug 17 20:59:20 [13182] kam1 cib: info: cib_perform_op: > + > > /cib/status/node_state[@id='1']/lrm[@id='1']/lrm_resources/lrm_resource[@id='rtpproxycluster']/lrm_rsc_op[@id='rtpproxycluster_last_0']: > @operation_key=rtpproxycluster_stop_0, @operation=stop, > @transition-key=14:30:0:8a202722-ece2-4617-b26e-8d4aa5f3522b, > @transition-magic=0:0;14:30:0:8a202722-ece2-4617-b26e-8d4aa5f3522b, > @call-id=106, @last-run=1471481958, @last-rc-change=1471481958, > @exec-time=2116 > > Aug 17 20:59:20 [13186] kam1 pengine: info: RecurringOp: Start > recurring monitor (10s) for rtpproxycluster on kam1vs3 > Aug 17 20:59:20 [13186] kam1 pengine: notice: LogActions: Start > rtpproxycluster (kam1vs3) > Aug 17 20:59:20 [13187] kam1 crmd: notice: te_rsc_command: > Initiating action 13: start rtpproxycluster_start_0 on kam1vs3 (local) > Aug 17 20:59:20 [13187] kam1 crmd: info: do_lrm_rsc_op: > Performing key=13:31:0:8a202722-ece2-4617-b26e-8d4aa5f3522b > op=rtpproxycluster_start_0 > Aug 17 20:59:20 [13184] kam1 lrmd: info: log_execute: > executing - rsc:rtpproxycluster action:start call_id:107 > Aug 17 20:59:21 [13184] kam1 lrmd: info: > systemd_exec_result: Call to start passed: > /org/freedesktop/systemd1/job/8303 > Aug 17 20:59:23 [13187] kam1 crmd: notice: process_lrm_event: > Operation rtpproxycluster_start_0: ok (node=kam1vs3, call=107, rc=0, > cib-update=136, confirmed=true) > Aug 17 20:59:23 [13182] kam1 cib: info: cib_perform_op: > + > > /cib/status/node_state[@id='1']/lrm[@id='1']/lrm_resources/lrm_resource[@id='rtpproxycluster']/lrm_rsc_op[@id='rtpproxycluster_last_0']: > @operation_key=rtpproxycluster_start_0, @operation=start, > @transition-key=13:31:0:8a202722-ece2-4617-b26e-8d4aa5f3522b, > @transition-magic=0:0;13:31:0:8a202722-ece2-4617-b26e-8d4aa5f3522b, > @call-id=107, @last-run=1471481960, @last-rc-change=1471481960, > @exec-time=2068 > > Why is this happening?, many of my resources depends on each other, so > the delay it takes when failover its needed is quite a lot (just 4 > seconds of delay stopping/starting just for that systemd resource. > > Is this normal?, is there anyway to solve this?, Do i really need to > make an OCF resource for that service (that would take some time that > i dont really have at this moment) > > Regards > > Alejandro > > > _______________________________________________ > Users mailing list: Users@clusterlabs.org > http://clusterlabs.org/mailman/listinfo/users > > Project Home: http://www.clusterlabs.org > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > Bugs: http://bugs.clusterlabs.org _______________________________________________ Users mailing list: Users@clusterlabs.org http://clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org