Re: [ClusterLabs] Pacemaker not restarting Resource on same node
On Thu, 2018-06-28 at 19:58 +0300, Andrei Borzenkov wrote: > 28.06.2018 18:35, Dileep V Nair пишет: > > > > > > Hi, > > > > I have a cluster with DB2 running in HADR mode. I have used the > > db2 > > resource agent. My problem is whenever DB2 fails on primary it is > > migrating > > to the secondary node. Ideally it should restart thrice (Migration > > Threshold set to 3) but not happening. This is causing extra > > downtime for > > customer. Is there any other settings / parameters which needs to > > be set. > > Did anyone face similar issue ? I am on pacemaker version 1.1.15- > > 21.1. > > > > It is impossible to answer without good knowledge of application and > resource agent. From quick look at resource agent, it removes master > score from current node if database failure is detected which means > current node will not be eligible for fail-over. > > Note that pacemaker does not really have concept of "restarting > resource > on the same node". Every time it performs full node selection using > current scores. It usually happens to be "same node" simply due to > non-zero resource stickiness by default. You could attempt to adjust > stickiness so that final score will be larger than master score on > standby. But that also needs agent cooperation - are you sure agent > will > even attempt to restart failed master locally? Also, some types of errors cannot be recovered by a restart on the same node. For example, by default, start failures will not be retried on the same node (see the cluster property start-failure-is-fatal), to avoid a repeatedly failing start preventing the cluster from doing anything else. Certain OCF resource agent exit codes are considered "hard" errors that prevent retrying on the same node: missing dependencies, file permission errors, etc. -- Ken Gaillot ___ Users mailing list: Users@clusterlabs.org https://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [ClusterLabs] Pacemaker not restarting Resource on same node
28.06.2018 18:35, Dileep V Nair пишет: > > > Hi, > > I have a cluster with DB2 running in HADR mode. I have used the db2 > resource agent. My problem is whenever DB2 fails on primary it is migrating > to the secondary node. Ideally it should restart thrice (Migration > Threshold set to 3) but not happening. This is causing extra downtime for > customer. Is there any other settings / parameters which needs to be set. > Did anyone face similar issue ? I am on pacemaker version 1.1.15-21.1. > It is impossible to answer without good knowledge of application and resource agent. From quick look at resource agent, it removes master score from current node if database failure is detected which means current node will not be eligible for fail-over. Note that pacemaker does not really have concept of "restarting resource on the same node". Every time it performs full node selection using current scores. It usually happens to be "same node" simply due to non-zero resource stickiness by default. You could attempt to adjust stickiness so that final score will be larger than master score on standby. But that also needs agent cooperation - are you sure agent will even attempt to restart failed master locally? ___ Users mailing list: Users@clusterlabs.org https://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
[ClusterLabs] Pacemaker not restarting Resource on same node
Hi, I have a cluster with DB2 running in HADR mode. I have used the db2 resource agent. My problem is whenever DB2 fails on primary it is migrating to the secondary node. Ideally it should restart thrice (Migration Threshold set to 3) but not happening. This is causing extra downtime for customer. Is there any other settings / parameters which needs to be set. Did anyone face similar issue ? I am on pacemaker version 1.1.15-21.1. Dileep V Nair dilen...@in.ibm.com IBM Services ___ Users mailing list: Users@clusterlabs.org https://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org