[ClusterLabs] Antw: Re: Informing RAs about recovery: failed resource recovery, or any start-stop cycle?
>>> Jehan-Guillaume de Rorthais schrieb am 19.05.2016 um >>> 21:29 in Nachricht <20160519212947.6cc0fd7b@firost>: [...] > I was thinking of a use case where a graceful demote or stop action failed > multiple times and to give a chance to the RA to choose another method to > stop > the resource before it requires a migration. As instance, PostgreSQL has 3 > different kind of stop, the last one being not graceful, but still better > than > a kill -9. For example the Xen RA tries a clean shutdown with a timeout of about 2/3 of the timeout; it it fails it shuts the VM down the hard way. I don't know Postgres in detail, but I could imagine a three step approach: 1) Shutdown after current operations have finished 2) Shutdown regardless of pending operations (doing rollbacks) 3) Shutdown the hard way, requiring recovery on the next start (I think in Oracle this is called a "shutdown abort") Depending on the scenario one may start at step 2) [...] I think RAs should not rely on "stop" being called multiple times for a resource to be stopped. Regards, Ulrich ___ Users mailing list: Users@clusterlabs.org http://clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [ClusterLabs] attrd does not clean per-node cache after node removal
On 03/23/2016 12:01 PM, Vladislav Bogdanov wrote: > 23.03.2016 19:52, Vladislav Bogdanov wrote: >> 23.03.2016 19:39, Ken Gaillot wrote: >>> On 03/23/2016 07:35 AM, Vladislav Bogdanov wrote: Hi! It seems like atomic attrd in post-1.1.14 (eb89393) does not fully clean node cache after node is removed. I haven't forgotten, this was a tricky one :-) I believe this regression was introduced in da17fd0, which clears the node's attribute *values* when purging the node, but not the value *structures* that contain the node name and ID. That was intended as a fix for when nodes leave and rejoin. However the same affected function is used to handle "crm_node -R" requests, which should cause complete removal. I hope to have a fix soon. Note that the behavior may still occur if "crm_node -R" is not called after reloading corosync. >>> Is this a regression? Or have you only tried it with this version? >> >> Only with this one. >> >>> After our QA guys remove node wa-test-server-ha-03 from a two-node cluster: * stop pacemaker and corosync on wa-test-server-ha-03 * remove node wa-test-server-ha-03 from corosync nodelist on wa-test-server-ha-04 * tune votequorum settings * reload corosync on wa-test-server-ha-04 * remove node from pacemaker on wa-test-server-ha-04 * delete everything from /var/lib/pacemaker/cib on wa-test-server-ha-03 , and then join it with the different corosync ID (but with the same node name), we see the following in logs: Leave node 1 (wa-test-server-ha-03): Mar 23 04:19:53 wa-test-server-ha-04 attrd[25962]: notice: crm_update_peer_proc: Node wa-test-server-ha-03[1] - state is now lost (was member) Mar 23 04:19:53 wa-test-server-ha-04 attrd[25962]: notice: Removing all wa-test-server-ha-03 (1) attributes for attrd_peer_change_cb Mar 23 04:19:53 wa-test-server-ha-04 attrd[25962]: notice: Lost attribute writer wa-test-server-ha-03 Mar 23 04:19:53 wa-test-server-ha-04 attrd[25962]: notice: Removing wa-test-server-ha-03/1 from the membership list Mar 23 04:19:53 wa-test-server-ha-04 attrd[25962]: notice: Purged 1 peers with id=1 and/or uname=wa-test-server-ha-03 from the membership cache Mar 23 04:19:56 wa-test-server-ha-04 attrd[25962]: notice: Processing peer-remove from wa-test-server-ha-04: wa-test-server-ha-03 0 Mar 23 04:19:56 wa-test-server-ha-04 attrd[25962]: notice: Removing all wa-test-server-ha-03 (0) attributes for wa-test-server-ha-04 Mar 23 04:19:56 wa-test-server-ha-04 attrd[25962]: notice: Removing wa-test-server-ha-03/1 from the membership list Mar 23 04:19:56 wa-test-server-ha-04 attrd[25962]: notice: Purged 1 peers with id=0 and/or uname=wa-test-server-ha-03 from the membership cache Join node 3 (the same one, wa-test-server-ha-03, but ID differs): Mar 23 04:21:23 wa-test-server-ha-04 attrd[25962]: notice: crm_update_peer_proc: Node wa-test-server-ha-03[3] - state is now member (was (null)) Mar 23 04:21:26 wa-test-server-ha-04 attrd[25962]: warning: crm_find_peer: Node 3/wa-test-server-ha-03 = 0x201bf30 - a4cbcdeb-c36a-4a0e-8ed6-c45b3db89296 Mar 23 04:21:26 wa-test-server-ha-04 attrd[25962]: warning: crm_find_peer: Node 2/wa-test-server-ha-04 = 0x1f90e20 - 6c18faa1-f8c2-4b0c-907c-20db450e2e79 Mar 23 04:21:26 wa-test-server-ha-04 attrd[25962]: crit: Node 1 and 3 share the same name 'wa-test-server-ha-03' >>> >>> It took me a while to understand the above combination of messages. This >>> is not node 3 joining. This is node 1 joining after node 3 has already >>> been seen. >> >> Hmmm... >> corosync.conf and corosync-cmapctl both say it is 3 >> Also, cib lists it as 3 and lrmd puts its status records under 3. > > I mean: > > crm-debug-origin="do_update_resource" in_ccm="true" join="member" > expected="member"> > > > ... > > > > > > > >name="master-rabbitmq-local" value="1"/> >name="master-meta-0-0-drbd" value="1"/> >name="master-staging-0-0-drbd" value="1"/> >value="1458732136"/> > > > > >> >> Actually issue is that drbd resources are not promoted because their >> master attributes go to section with node-id 1. And that is the only >> reason why we found that. Everything not related to volatile attributes >> works well. >> >>> >>> The warnings are a complete dump of the peer cache. So you can see that >>> wa-test-server-ha-03 is listed only once, with id 3. >>> >>> The critical message ("Node 1 and 3") lists the new id first and the >>> found ID second. So id 1 is what it's trying to add to the cache. >> >> But there is also 'Node 'wa-test-server-ha-03' has changed its ID from 1 >> to 3' - it goes first. Does that matter? >> >>
Re: [ClusterLabs] Informing RAs about recovery: failed resource recovery, or any start-stop cycle?
Le Thu, 19 May 2016 13:15:20 -0500, Ken Gaillot a écrit : > On 05/19/2016 11:43 AM, Jehan-Guillaume de Rorthais wrote: >> Le Thu, 19 May 2016 10:53:31 -0500, >> Ken Gaillot a écrit : >> >>> A recent thread discussed a proposed new feature, a new environment >>> variable that would be passed to resource agents, indicating whether a >>> stop action was part of a recovery. >>> >>> Since that thread was long and covered a lot of topics, I'm starting a >>> new one to focus on the core issue remaining: >>> >>> The original idea was to pass the number of restarts remaining before >>> the resource will no longer tried to be started on the same node. This >>> involves calculating (fail-count - migration-threshold), and that >>> implies certain limitations: (1) it will only be set when the cluster >>> checks migration-threshold; (2) it will only be set for the failed >>> resource itself, not for other resources that may be recovered due to >>> dependencies on it. >>> >>> Ulrich Windl proposed an alternative: setting a boolean value instead. I >>> forgot to cc the list on my reply, so I'll summarize now: We would set a >>> new variable like OCF_RESKEY_CRM_recovery=true whenever a start is >>> scheduled after a stop on the same node in the same transition. This >>> would avoid the corner cases of the previous approach; instead of being >>> tied to migration-threshold, it would be set whenever a recovery was >>> being attempted, for any reason. And with this approach, it should be >>> easier to set the variable for all actions on the resource >>> (demote/stop/start/promote), rather than just the stop. >> >> I can see the value of having such variable during various actions. >> However, we can also deduce the transition is a recovering during the >> notify actions with the notify variables (the only information we lack is >> the order of the actions). A most flexible approach would be to make sure >> the notify variables are always available during the whole transaction for >> **all** actions, not just notify. It seems like it's already the case, but >> a recent discussion emphase this is just a side effect of the current >> implementation. I understand this as they were sometime available outside >> of notification "by accident". > > It does seem that a recovery could be implied from the > notify_{start,stop}_uname variables, but notify variables are only set > for clones that support the notify action. I think the goal here is to > work with any resource type. Even for clones, if they don't otherwise > need notifications, they'd have to add the overhead of notify calls on > all instances, that would do nothing. Exact, notify variables are only available for clones, presently. What I was suggesting is that notify variables were always available, whatever the resource is a clone, a ms or a standard one. And I wasn't meaning notify *action* should be activated all the time for all the resources. The notify switch for clones/ms could be kept to false by default so the notify action is not called itself during the transitions. > > Also, I can see the benefit of having the remaining attempt for the current > > action before hitting the migration-threshold. I might misunderstand > > something here, but it seems to me both informations are different. > > I think the use cases that have been mentioned would all be happy with > just the boolean. Does anyone need the actual count, or just whether > this is a stop-start vs a full stop? I was thinking of a use case where a graceful demote or stop action failed multiple times and to give a chance to the RA to choose another method to stop the resource before it requires a migration. As instance, PostgreSQL has 3 different kind of stop, the last one being not graceful, but still better than a kill -9. > The problem with the migration-threshold approach is that there are > recoveries that will be missed because they don't involve > migration-threshold. If the count is really needed, the > migration-threshold approach is necessary, but if recovery is the really > interesting information, then a boolean would be more accurate. I think I misunderstood the original use cases you try to achieve. It seems to me we are talking about different a feature. >> Basically, what we need is a better understanding of the transition itself >> from the RA actions. >> >> If you are still brainstorming on this, as a RA dev, what I would >> suggest is: >> >> * provide and enforce the notify variables in all actions >> * add the actions order during the current transition to these variables >> using eg. OCF_RESKEY_CRM_meta_notify_*_actionid > > The action ID would be different for each node being acted on, so it > would be more complicated (maybe *_actions="NODE1:ID1,NODE2:ID2,..."?). Following the principle adopted for other variables, each ID would apply to the corresponding resource and node in OCF_RESKEY_CRM_meta_notify_*_uname and OCF_RESKEY_CRM_meta_notify_*_rsc. > Also, RA w
Re: [ClusterLabs] Informing RAs about recovery: failed resource recovery, or any start-stop cycle?
On 05/19/2016 11:43 AM, Jehan-Guillaume de Rorthais wrote: > Le Thu, 19 May 2016 10:53:31 -0500, > Ken Gaillot a écrit : > >> A recent thread discussed a proposed new feature, a new environment >> variable that would be passed to resource agents, indicating whether a >> stop action was part of a recovery. >> >> Since that thread was long and covered a lot of topics, I'm starting a >> new one to focus on the core issue remaining: >> >> The original idea was to pass the number of restarts remaining before >> the resource will no longer tried to be started on the same node. This >> involves calculating (fail-count - migration-threshold), and that >> implies certain limitations: (1) it will only be set when the cluster >> checks migration-threshold; (2) it will only be set for the failed >> resource itself, not for other resources that may be recovered due to >> dependencies on it. >> >> Ulrich Windl proposed an alternative: setting a boolean value instead. I >> forgot to cc the list on my reply, so I'll summarize now: We would set a >> new variable like OCF_RESKEY_CRM_recovery=true whenever a start is >> scheduled after a stop on the same node in the same transition. This >> would avoid the corner cases of the previous approach; instead of being >> tied to migration-threshold, it would be set whenever a recovery was >> being attempted, for any reason. And with this approach, it should be >> easier to set the variable for all actions on the resource >> (demote/stop/start/promote), rather than just the stop. > > I can see the value of having such variable during various actions. However, > we > can also deduce the transition is a recovering during the notify actions with > the notify variables (the only information we lack is the order of the > actions). A most flexible approach would be to make sure the notify variables > are always available during the whole transaction for **all** actions, not > just > notify. It seems like it's already the case, but a recent discussion emphase > this is just a side effect of the current implementation. I understand this > as > they were sometime available outside of notification "by accident". It does seem that a recovery could be implied from the notify_{start,stop}_uname variables, but notify variables are only set for clones that support the notify action. I think the goal here is to work with any resource type. Even for clones, if they don't otherwise need notifications, they'd have to add the overhead of notify calls on all instances, that would do nothing. > Also, I can see the benefit of having the remaining attempt for the current > action before hitting the migration-threshold. I might misunderstand something > here, but it seems to me both informations are different. I think the use cases that have been mentioned would all be happy with just the boolean. Does anyone need the actual count, or just whether this is a stop-start vs a full stop? The problem with the migration-threshold approach is that there are recoveries that will be missed because they don't involve migration-threshold. If the count is really needed, the migration-threshold approach is necessary, but if recovery is the really interesting information, then a boolean would be more accurate. > Basically, what we need is a better understanding of the transition itself > from the RA actions. > > If you are still brainstorming on this, as a RA dev, what I would > suggest is: > > * provide and enforce the notify variables in all actions > * add the actions order during the current transition to these variables > using > eg. OCF_RESKEY_CRM_meta_notify_*_actionid The action ID would be different for each node being acted on, so it would be more complicated (maybe *_actions="NODE1:ID1,NODE2:ID2,..."?). Also, RA writers would need to be aware that some actions may be initiated in parallel. Probably more complex than it's worth. > * add a new variable with remaining action attempt before migration. This > one > has the advantage to survive the transition breakage when a failure > occurs. > > On a second step, we would be able to provide some helper function in the > ocf_shellfuncs (and in my perl module equivalent) to compute if the transition > is a switchover, a failover, a recovery, etc, based on the notify variables. > > Presently, I am detecting such scenarios directly in my RA during the notify > actions and tracking them as private attributes to be aware of the situation > during the real actions (demote and stop). See: > > https://github.com/dalibo/PAF/blob/952cb3cf2f03aad18fbeafe3a91f997a56c3b606/script/pgsqlms#L95 > > Regards, > ___ Users mailing list: Users@clusterlabs.org http://clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [ClusterLabs] Informing RAs about recovery: failed resource recovery, or any start-stop cycle?
Le Thu, 19 May 2016 10:53:31 -0500, Ken Gaillot a écrit : > A recent thread discussed a proposed new feature, a new environment > variable that would be passed to resource agents, indicating whether a > stop action was part of a recovery. > > Since that thread was long and covered a lot of topics, I'm starting a > new one to focus on the core issue remaining: > > The original idea was to pass the number of restarts remaining before > the resource will no longer tried to be started on the same node. This > involves calculating (fail-count - migration-threshold), and that > implies certain limitations: (1) it will only be set when the cluster > checks migration-threshold; (2) it will only be set for the failed > resource itself, not for other resources that may be recovered due to > dependencies on it. > > Ulrich Windl proposed an alternative: setting a boolean value instead. I > forgot to cc the list on my reply, so I'll summarize now: We would set a > new variable like OCF_RESKEY_CRM_recovery=true whenever a start is > scheduled after a stop on the same node in the same transition. This > would avoid the corner cases of the previous approach; instead of being > tied to migration-threshold, it would be set whenever a recovery was > being attempted, for any reason. And with this approach, it should be > easier to set the variable for all actions on the resource > (demote/stop/start/promote), rather than just the stop. I can see the value of having such variable during various actions. However, we can also deduce the transition is a recovering during the notify actions with the notify variables (the only information we lack is the order of the actions). A most flexible approach would be to make sure the notify variables are always available during the whole transaction for **all** actions, not just notify. It seems like it's already the case, but a recent discussion emphase this is just a side effect of the current implementation. I understand this as they were sometime available outside of notification "by accident". Also, I can see the benefit of having the remaining attempt for the current action before hitting the migration-threshold. I might misunderstand something here, but it seems to me both informations are different. Basically, what we need is a better understanding of the transition itself from the RA actions. If you are still brainstorming on this, as a RA dev, what I would suggest is: * provide and enforce the notify variables in all actions * add the actions order during the current transition to these variables using eg. OCF_RESKEY_CRM_meta_notify_*_actionid * add a new variable with remaining action attempt before migration. This one has the advantage to survive the transition breakage when a failure occurs. On a second step, we would be able to provide some helper function in the ocf_shellfuncs (and in my perl module equivalent) to compute if the transition is a switchover, a failover, a recovery, etc, based on the notify variables. Presently, I am detecting such scenarios directly in my RA during the notify actions and tracking them as private attributes to be aware of the situation during the real actions (demote and stop). See: https://github.com/dalibo/PAF/blob/952cb3cf2f03aad18fbeafe3a91f997a56c3b606/script/pgsqlms#L95 Regards, ___ Users mailing list: Users@clusterlabs.org http://clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
[ClusterLabs] Informing RAs about recovery: failed resource recovery, or any start-stop cycle?
A recent thread discussed a proposed new feature, a new environment variable that would be passed to resource agents, indicating whether a stop action was part of a recovery. Since that thread was long and covered a lot of topics, I'm starting a new one to focus on the core issue remaining: The original idea was to pass the number of restarts remaining before the resource will no longer tried to be started on the same node. This involves calculating (fail-count - migration-threshold), and that implies certain limitations: (1) it will only be set when the cluster checks migration-threshold; (2) it will only be set for the failed resource itself, not for other resources that may be recovered due to dependencies on it. Ulrich Windl proposed an alternative: setting a boolean value instead. I forgot to cc the list on my reply, so I'll summarize now: We would set a new variable like OCF_RESKEY_CRM_recovery=true whenever a start is scheduled after a stop on the same node in the same transition. This would avoid the corner cases of the previous approach; instead of being tied to migration-threshold, it would be set whenever a recovery was being attempted, for any reason. And with this approach, it should be easier to set the variable for all actions on the resource (demote/stop/start/promote), rather than just the stop. I think the boolean approach fits all the envisioned use cases that have been discussed. Any objections to going that route instead of the count? -- Ken Gaillot ___ Users mailing list: Users@clusterlabs.org http://clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [ClusterLabs] Node attributes
On 05/18/2016 10:49 PM, H Yavari wrote: > Hi, > > How can I define a constraint for two resource based on one nodes > attribute? > > For example resource X and Y are co-located based on node attribute Z. > > > > Regards, > H.Yavari Hi, See http://clusterlabs.org/doc/en-US/Pacemaker/1.1-pcs/html-single/Pacemaker_Explained/index.html#idm140617356537136 High-level tools such as pcs and crm provide a simpler interface, but the concepts will be the same. This works for location constraints, not colocation, but you can easily accomplish what you want. If your goal is that X and Y each can only run on a node with attribute Z, then set up a location constraint for each one using the appropriate rule. If you goal is that X and Y must be colocated together, on a node with attribute Z, then set up a regular colocation constraint between them, and a location constraint for one of them with the appropriate rule; or, put them in a group, and set up a location constraint for the group with the appropriate rule. ___ Users mailing list: Users@clusterlabs.org http://clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
[ClusterLabs] Antw: Pacemaker restart resources when node joins cluster after failback
>>> Dharmesh schrieb am 19.05.2016 um 13:18 in >>> Nachricht : > Hi, > > i am having a 2 node Debian cluster with resources configured in it. > Everything is working fine apart from one thing. Usually you find the reasons in the logs (syslog, cluster log, etc.). > > As and when one of my two node joins the cluster, the resources configured > on currently active node gets restarted. I am not able to figure out why > cluster is behaving like this. > > Below is the configuration of my cluster > > node $id="775bad88-0954-40bf-b9e4-4f012a76a34c" testsrv2 \ > attributes standby="off" > node $id="b1d07507-6191-425c-bee3-14229c85820f" testsrv1 \ > attributes standby="off" > primitive ClusterIp ocf:heartbeat:IPaddr2 \ > params ip="192.168.120.209" nic="eth0" cidr_netmask="24" \ > op monitor start-delay="0" interval="30" \ > meta target-role="started" > primitive DBClusterIp ocf:heartbeat:IPaddr2 \ > params ip="192.168.120.210" nic="eth0" cidr_netmask="24" \ > op monitor interval="30" start-delay="0" \ > meta target-role="started" > primitive Postgres-9.3 lsb:postgres-9.3-openscg \ > op start interval="0" timeout="15" \ > op stop interval="0" timeout="15" \ > op monitor interval="15" timeout="15" start-delay="15" \ > meta target-role="started" migration-threshold="1" > primitive PowerDns lsb:pdns \ > op start interval="0" timeout="15" \ > op stop interval="0" timeout="15" \ > op monitor interval="15" timeout="15" start-delay="15" \ > meta target-role="started" migration-threshold="2" > primitive PsqlMasterToStandby ocf:heartbeat:PsqlMasterToStandby \ > op start interval="0" timeout="20" start-delay="10" \ > op monitor interval="10" timeout="240" start-delay="10" \ > op stop interval="0" timeout="20" \ > meta target-role="started" > primitive PsqlPromote ocf:heartbeat:PsqlPromote \ > op start interval="0" timeout="20" \ > op stop interval="0" timeout="20" \ > op monitor interval="10" timeout="20" start-delay="10" \ > meta target-role="started" > group Database Postgres-9.3 PsqlPromote > colocation col_Database_DBClusterIp inf: Database DBClusterIp > colocation col_Database_PsqlMasterToStandby inf: Database > PsqlMasterToStandby > colocation col_PowerDns_ClusterIp inf: PowerDns ClusterIp > order ord_Database_DBClusterIp inf: Database DBClusterIp > order ord_Database_PsqlMasterToStandby inf: Database PsqlMasterToStandby > order ord_PowerDns_ClusterIp inf: PowerDns ClusterIp > property $id="cib-bootstrap-options" \ > stonith-enabled="false" \ > dc-version="1.1.10-42f2063" \ > cluster-infrastructure="heartbeat" \ > last-lrm-refresh="1453192778" > rsc_defaults $id="rsc-options" \ > resource-stickiness="100" \ > failure-timeout="60s" > #vim:set syntax=pcmk > > Let me know if my configuration is not appropriate or some new > configuration needs to be done. > > Thanks and regards, > > -- > Dharmesh Kumar ___ Users mailing list: Users@clusterlabs.org http://clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
[ClusterLabs] Pacemaker restart resources when node joins cluster after failback
Hi, i am having a 2 node Debian cluster with resources configured in it. Everything is working fine apart from one thing. As and when one of my two node joins the cluster, the resources configured on currently active node gets restarted. I am not able to figure out why cluster is behaving like this. Below is the configuration of my cluster node $id="775bad88-0954-40bf-b9e4-4f012a76a34c" testsrv2 \ attributes standby="off" node $id="b1d07507-6191-425c-bee3-14229c85820f" testsrv1 \ attributes standby="off" primitive ClusterIp ocf:heartbeat:IPaddr2 \ params ip="192.168.120.209" nic="eth0" cidr_netmask="24" \ op monitor start-delay="0" interval="30" \ meta target-role="started" primitive DBClusterIp ocf:heartbeat:IPaddr2 \ params ip="192.168.120.210" nic="eth0" cidr_netmask="24" \ op monitor interval="30" start-delay="0" \ meta target-role="started" primitive Postgres-9.3 lsb:postgres-9.3-openscg \ op start interval="0" timeout="15" \ op stop interval="0" timeout="15" \ op monitor interval="15" timeout="15" start-delay="15" \ meta target-role="started" migration-threshold="1" primitive PowerDns lsb:pdns \ op start interval="0" timeout="15" \ op stop interval="0" timeout="15" \ op monitor interval="15" timeout="15" start-delay="15" \ meta target-role="started" migration-threshold="2" primitive PsqlMasterToStandby ocf:heartbeat:PsqlMasterToStandby \ op start interval="0" timeout="20" start-delay="10" \ op monitor interval="10" timeout="240" start-delay="10" \ op stop interval="0" timeout="20" \ meta target-role="started" primitive PsqlPromote ocf:heartbeat:PsqlPromote \ op start interval="0" timeout="20" \ op stop interval="0" timeout="20" \ op monitor interval="10" timeout="20" start-delay="10" \ meta target-role="started" group Database Postgres-9.3 PsqlPromote colocation col_Database_DBClusterIp inf: Database DBClusterIp colocation col_Database_PsqlMasterToStandby inf: Database PsqlMasterToStandby colocation col_PowerDns_ClusterIp inf: PowerDns ClusterIp order ord_Database_DBClusterIp inf: Database DBClusterIp order ord_Database_PsqlMasterToStandby inf: Database PsqlMasterToStandby order ord_PowerDns_ClusterIp inf: PowerDns ClusterIp property $id="cib-bootstrap-options" \ stonith-enabled="false" \ dc-version="1.1.10-42f2063" \ cluster-infrastructure="heartbeat" \ last-lrm-refresh="1453192778" rsc_defaults $id="rsc-options" \ resource-stickiness="100" \ failure-timeout="60s" #vim:set syntax=pcmk Let me know if my configuration is not appropriate or some new configuration needs to be done. Thanks and regards, -- Dharmesh Kumar ___ Users mailing list: Users@clusterlabs.org http://clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org