Re: [ClusterLabs] Pacemaker not restarting Resource on same node

2018-06-28 Thread Ken Gaillot
On Thu, 2018-06-28 at 19:58 +0300, Andrei Borzenkov wrote:
> 28.06.2018 18:35, Dileep V Nair пишет:
> > 
> > 
> > Hi,
> > 
> > I have a cluster with DB2 running in HADR mode. I have used the
> > db2
> > resource agent. My problem is whenever DB2 fails on primary it is
> > migrating
> > to the secondary node. Ideally it should restart thrice (Migration
> > Threshold set to 3) but not happening. This is causing extra
> > downtime for
> > customer. Is there any other settings / parameters which needs to
> > be set.
> > Did anyone face similar issue ? I am on pacemaker version 1.1.15-
> > 21.1.
> > 
> 
> It is impossible to answer without good knowledge of application and
> resource agent. From quick look at resource agent, it removes master
> score from current node if database failure is detected which means
> current node will not be eligible for fail-over.
> 
> Note that pacemaker does not really have concept of "restarting
> resource
> on the same node". Every time it performs full node selection using
> current scores. It usually happens to be "same node" simply due to
> non-zero resource stickiness by default. You could attempt to adjust
> stickiness so that final score will be larger than master score on
> standby. But that also needs agent cooperation - are you sure agent
> will
> even attempt to restart failed master locally?

Also, some types of errors cannot be recovered by a restart on the same
node.

For example, by default, start failures will not be retried on the same
node (see the cluster property start-failure-is-fatal), to avoid a
repeatedly failing start preventing the cluster from doing anything
else. Certain OCF resource agent exit codes are considered "hard"
errors that prevent retrying on the same node: missing dependencies,
file permission errors, etc.
-- 
Ken Gaillot 
___
Users mailing list: Users@clusterlabs.org
https://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] Pacemaker not restarting Resource on same node

2018-06-28 Thread Andrei Borzenkov
28.06.2018 18:35, Dileep V Nair пишет:
> 
> 
> Hi,
> 
>   I have a cluster with DB2 running in HADR mode. I have used the db2
> resource agent. My problem is whenever DB2 fails on primary it is migrating
> to the secondary node. Ideally it should restart thrice (Migration
> Threshold set to 3) but not happening. This is causing extra downtime for
> customer. Is there any other settings / parameters which needs to be set.
> Did anyone face similar issue ? I am on pacemaker version 1.1.15-21.1.
> 

It is impossible to answer without good knowledge of application and
resource agent. From quick look at resource agent, it removes master
score from current node if database failure is detected which means
current node will not be eligible for fail-over.

Note that pacemaker does not really have concept of "restarting resource
on the same node". Every time it performs full node selection using
current scores. It usually happens to be "same node" simply due to
non-zero resource stickiness by default. You could attempt to adjust
stickiness so that final score will be larger than master score on
standby. But that also needs agent cooperation - are you sure agent will
even attempt to restart failed master locally?
___
Users mailing list: Users@clusterlabs.org
https://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


[ClusterLabs] Pacemaker not restarting Resource on same node

2018-06-28 Thread Dileep V Nair


Hi,

I have a cluster with DB2 running in HADR mode. I have used the db2
resource agent. My problem is whenever DB2 fails on primary it is migrating
to the secondary node. Ideally it should restart thrice (Migration
Threshold set to 3) but not happening. This is causing extra downtime for
customer. Is there any other settings / parameters which needs to be set.
Did anyone face similar issue ? I am on pacemaker version 1.1.15-21.1.

Dileep V Nair

dilen...@in.ibm.com

IBM Services
___
Users mailing list: Users@clusterlabs.org
https://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org