I'm reasonably sure this was a bug we fixed in 1.0.12 Could you update and retest?
On Tue, Feb 7, 2012 at 9:35 PM, neha chatrath <nehachatr...@gmail.com> wrote: > Hello, > I have a 2 node cluster with following configuration: > *node $id="9e53a111-0dca-496c-9461-a38f3eec4d0e" mcg2 \ > attributes standby="off" > node $id="a90981f8-d993-4411-89f4-aff7156136d2" mcg1 \ > attributes standby="off" > primitive ClusterIP ocf:mcg:MCG_VIPaddr_RA \ > params ip="192.168.115.50" cidr_netmask="255.255.255.0" > nic="bond1.115:1" \ > op monitor interval="40" timeout="20" \ > meta target-role="Started" > primitive EMS ocf:heartbeat:jboss \ > params jboss_home="/opt/jboss-5.1.0.GA" > java_home="/opt/jdk1.6.0_29/" \ > op start interval="0" timeout="240" \ > op stop interval="0" timeout="240" \ > op monitor interval="30s" timeout="40s" > primitive NDB_MGMT ocf:mcg:NDB_MGM_RA \ > op monitor interval="120" timeout="120" > primitive NDB_VIP ocf:heartbeat:IPaddr2 \ > params ip="192.168.117.50" cidr_netmask="255.255.255.255" > nic="bond0.117:1" \ > op monitor interval="30" timeout="10" > primitive Rmgr ocf:mcg:RM_RA \ > op monitor interval="60" role="Master" timeout="30" > on-fail="restart" \ > op monitor interval="40" role="Slave" timeout="40" on-fail="restart" > primitive Tmgr ocf:mcg:TM_RA \ > op monitor interval="60" role="Master" timeout="30" > on-fail="restart" \ > op monitor interval="40" role="Slave" timeout="40" on-fail="restart" > primitive mysql ocf:mcg:MYSQLD_RA \ > op monitor interval="180" timeout="200" > primitive ndbd ocf:mcg:NDBD_RA \ > op monitor interval="120" timeout="120" > primitive pimd ocf:mcg:PIMD_RA \ > op monitor interval="60" role="Master" timeout="30" > on-fail="restart" \ > op monitor interval="40" role="Slave" timeout="40" on-fail="restart" > ms ms_Rmgr Rmgr \ > meta master-max="1" master-max-node="1" clone-max="2" > clone-node-max="1" interleave="true" notify="true" > ms ms_Tmgr Tmgr \ > meta master-max="1" master-max-node="1" clone-max="2" > clone-node-max="1" interleave="true" notify="true" > ms ms_pimd pimd \ > meta master-max="1" master-max-node="1" clone-max="2" > clone-node-max="1" interleave="true" notify="true" > clone EMS_CLONE EMS \ > meta globally-unique="false" clone-max="2" clone-node-max="1" > target-role="Started" > clone mysqld_clone mysql \ > meta globally-unique="false" clone-max="2" clone-node-max="1" > clone ndbdclone ndbd \ > meta globally-unique="false" clone-max="2" clone-node-max="1" > target-role="Started" > colocation ip_with_Pimd inf: ClusterIP ms_pimd:Master > colocation ip_with_RM inf: ClusterIP ms_Rmgr:Master > colocation ip_with_TM inf: ClusterIP ms_Tmgr:Master > colocation ndb_vip-with-ndb_mgm inf: NDB_MGMT NDB_VIP > order RM-after-mysqld inf: mysqld_clone ms_Rmgr > order TM-after-RM inf: ms_Rmgr ms_Tmgr > order ip-after-pimd inf: ms_pimd ClusterIP > order mysqld-after-ndbd inf: ndbdclone mysqld_clone > order pimd-after-TM inf: ms_Tmgr ms_pimd > property $id="cib-bootstrap-options" \ > dc-version="1.0.11-55a5f5be61c367cbd676c2f0ec4f1c62b38223d7" \ > cluster-infrastructure="Heartbeat" \ > no-quorum-policy="ignore" \ > stonith-enabled="false" > rsc_defaults $id="rsc-options" \ > migration_threshold="3" \ > resource-stickiness="100"* > > With both nodes up and running, if heartbeat service is stopped on any of > the nodes, following resources are restarted on the other node: > mysqld_clone, ms_Rmgr, ms_Tmgr, ms_pimd, ClusterIP > > From the Heartbeat debug logs, it seems policy engine is initiating a > restart operation for the above resources but the reason for the same is not > clear. > > Following are some excerpts from the logs: > > "Feb 07 11:06:31 MCG1 pengine: [20534]: info: determine_online_status: Node > mcg2 is shutting down > Feb 07 11:06:31 MCG1 pengine: [20534]: info: determine_online_status: Node > mcg1 is online > Feb 07 11:06:31 MCG1 pengine: [20534]: notice: clone_print: Master/Slave > Set: ms_Rmgr > Feb 07 11:06:31 MCG1 pengine: [20534]: debug: native_active: Resource Rmgr:0 > active on mcg1 > Feb 07 11:06:31 MCG1 pengine: [20534]: debug: native_active: Resource Rmgr:0 > active on mcg1 > Feb 07 11:06:31 MCG1 pengine: [20534]: debug: native_active: Resource Rmgr:1 > active on mcg2 > Feb 07 11:06:31 MCG1 pengine: [20534]: debug: native_active: Resource Rmgr:1 > active on mcg2 > Feb 07 11:06:31 MCG1 pengine: [20534]: notice: short_print: Masters: [ > mcg1 ] > Feb 07 11:06:31 MCG1 pengine: [20534]: notice: short_print: Slaves: [ > mcg2 ] > Feb 07 11:06:31 MCG1 pengine: [20534]: notice: clone_print: Master/Slave > Set: ms_Tmgr > Feb 07 11:06:31 MCG1 pengine: [20534]: debug: native_active: Resource Tmgr:0 > active on mcg1 > Feb 07 11:06:31 MCG1 pengine: [20534]: debug: native_active: Resource Tmgr:0 > active on mcg1 > Feb 07 11:06:31 MCG1 pengine: [20534]: debug: native_active: Resource Tmgr:1 > active on mcg2 > Feb 07 11:06:31 MCG1 pengine: [20534]: debug: native_active: Resource Tmgr:1 > active on mcg2 > Feb 07 11:06:31 MCG1 pengine: [20534]: notice: short_print: Masters: [ > mcg1 ] > Feb 07 11:06:31 MCG1 pengine: [20534]: notice: short_print: Slaves: [ > mcg2 ] > Feb 07 11:06:31 MCG1 pengine: [20534]: notice: clone_print: Master/Slave > Set: ms_pimd > Feb 07 11:06:31 MCG1 pengine: [20534]: debug: native_active: Resource pimd:0 > active on mcg1 > Feb 07 11:06:31 MCG1 pengine: [20534]: debug: native_active: Resource pimd:0 > active on mcg1 > Feb 07 11:06:31 MCG1 pengine: [20534]: debug: native_active: Resource pimd:1 > active on mcg2 > Feb 07 11:06:31 MCG1 pengine: [20534]: debug: native_active: Resource pimd:1 > active on mcg2 > Feb 07 11:06:31 MCG1 pengine: [20534]: notice: short_print: Masters: [ > mcg1 ] > Feb 07 11:06:31 MCG1 pengine: [20534]: notice: short_print: Slaves: [ > mcg2 ] > Feb 07 11:06:31 MCG1 pengine: [20534]: notice: native_print: ClusterIP > (ocf::mcg:MCG_VIPaddr_RA): Started mcg1 > Feb 07 11:06:31 MCG1 pengine: [20534]: notice: clone_print: Clone Set: > EMS_CLONE > Feb 07 11:06:31 MCG1 pengine: [20534]: debug: native_active: Resource EMS:0 > active on mcg1 > Feb 07 11:06:31 MCG1 pengine: [20534]: debug: native_active: Resource EMS:0 > active on mcg1 > Feb 07 11:06:31 MCG1 pengine: [20534]: debug: native_active: Resource EMS:1 > active on mcg2 > Feb 07 11:06:31 MCG1 pengine: [20534]: debug: native_active: Resource EMS:1 > active on mcg2 > Feb 07 11:06:31 MCG1 pengine: [20534]: notice: short_print: Started: [ > mcg1 mcg2 ] > Feb 07 11:06:31 MCG1 pengine: [20534]: notice: native_print: NDB_VIP > (ocf::heartbeat:IPaddr2): Started mcg1 > Feb 07 11:06:31 MCG1 pengine: [20534]: notice: native_print: NDB_MGMT > (ocf::mcg:NDB_MGM_RA): Started mcg1 > Feb 07 11:06:31 MCG1 pengine: [20534]: notice: clone_print: Clone Set: > mysqld_clone > Feb 07 11:06:31 MCG1 pengine: [20534]: debug: native_active: Resource > mysql:0 active on mcg1 > Feb 07 11:06:31 MCG1 pengine: [20534]: debug: native_active: Resource > mysql:0 active on mcg1 > Feb 07 11:06:31 MCG1 pengine: [20534]: debug: native_active: Resource > mysql:1 active on mcg2 > Feb 07 11:06:31 MCG1 pengine: [20534]: debug: native_active: Resource > mysql:1 active on mcg2 > Feb 07 11:06:31 MCG1 pengine: [20534]: notice: short_print: Started: [ > mcg1 mcg2 ] > Feb 07 11:06:31 MCG1 pengine: [20534]: notice: clone_print: Clone Set: > ndbdclone > Feb 07 11:06:31 MCG1 pengine: [20534]: debug: native_active: Resource ndbd:0 > active on mcg1 > Feb 07 11:06:31 MCG1 pengine: [20534]: debug: native_active: Resource ndbd:0 > active on mcg1 > Feb 07 11:06:31 MCG1 pengine: [20534]: debug: native_active: Resource ndbd:1 > active on mcg2 > Feb 07 11:06:31 MCG1 pengine: [20534]: debug: native_active: Resource ndbd:1 > active on mcg2 > Feb 07 11:06:31 MCG1 pengine: [20534]: notice: short_print: Started: [ > mcg1 mcg2 ] > Feb 07 11:06:31 MCG1 pengine: [20534]: debug: common_apply_stickiness: > Resource Rmgr:1: preferring current location (node=mcg2, weight=100) > Feb 07 11:06:31 MCG1 pengine: [20534]: debug: common_apply_stickiness: > Resource Tmgr:1: preferring current location (node=mcg2, weight=100) > Feb 07 11:06:31 MCG1 pengine: [20534]: debug: common_apply_stickiness: > Resource pimd:1: preferring current location (node=mcg2, weight=100) > Feb 07 11:06:31 MCG1 pengine: [20534]: debug: common_apply_stickiness: > Resource EMS:1: preferring current location (node=mcg2, weight=100) > Feb 07 11:06:31 MCG1 pengine: [20534]: debug: common_apply_stickiness: > Resource mysql:1: preferring current location (node=mcg2, weight=100) > Feb 07 11:06:31 MCG1 pengine: [20534]: debug: common_apply_stickiness: > Resource ndbd:1: preferring current location (node=mcg2, weight=100) > Feb 07 11:06:31 MCG1 pengine: [20534]: debug: common_apply_stickiness: > Resource Rmgr:0: preferring current location (node=mcg1, weight=100) > Feb 07 11:06:31 MCG1 pengine: [20534]: debug: common_apply_stickiness: > Resource Tmgr:0: preferring current location (node=mcg1, weight=100) > Feb 07 11:06:31 MCG1 pengine: [20534]: debug: common_apply_stickiness: > Resource pimd:0: preferring current location (node=mcg1, weight=100) > Feb 07 11:06:31 MCG1 pengine: [20534]: debug: common_apply_stickiness: > Resource ClusterIP: preferring current location (node=mcg1, weight=100) > Feb 07 11:06:31 MCG1 pengine: [20534]: debug: common_apply_stickiness: > Resource EMS:0: preferring current location (node=mcg1, weight=100) > Feb 07 11:06:31 MCG1 pengine: [20534]: debug: common_apply_stickiness: > Resource NDB_VIP: preferring current location (node=mcg1, weight=100) > Feb 07 11:06:31 MCG1 pengine: [20534]: debug: common_apply_stickiness: > Resource NDB_MGMT: preferring current location (node=mcg1, weight=100) > Feb 07 11:06:31 MCG1 pengine: [20534]: debug: common_apply_stickiness: > Resource mysql:0: preferring current location (node=mcg1, weight=100) > Feb 07 11:06:31 MCG1 pengine: [20534]: debug: common_apply_stickiness: > Resource ndbd:0: preferring current location (node=mcg1, weight=100) > Feb 07 11:06:31 MCG1 pengine: [20534]: debug: native_assign_node: Assigning > mcg1 to Rmgr:0 > Feb 07 11:06:31 MCG1 pengine: [20534]: debug: native_assign_node: All nodes > for resource Rmgr:1 are unavailable, unclean or shutting down (mcg2: 0, > -1000000) > Feb 07 11:06:31 MCG1 pengine: [20534]: debug: native_assign_node: Could not > allocate a node for Rmgr:1 > Feb 07 11:06:31 MCG1 pengine: [20534]: info: native_color: Resource Rmgr:1 > cannot run anywhere > Feb 07 11:06:31 MCG1 pengine: [20534]: debug: clone_color: Allocated 1 > ms_Rmgr instances of a possible 2 > Feb 07 11:06:31 MCG1 pengine: [20534]: debug: master_color: Rmgr:0 master > score: 10 > Feb 07 11:06:31 MCG1 pengine: [20534]: info: master_color: Promoting Rmgr:0 > (Master mcg1) > Feb 07 11:06:31 MCG1 pengine: [20534]: debug: master_color: Rmgr:1 master > score: 0 > Feb 07 11:06:31 MCG1 pengine: [20534]: info: master_color: ms_Rmgr: Promoted > 1 instances of a possible 1 to master > Feb 07 11:06:31 MCG1 pengine: [20534]: debug: native_assign_node: Assigning > mcg1 to Tmgr:0 > Feb 07 11:06:31 MCG1 pengine: [20534]: debug: native_assign_node: All nodes > for resource Tmgr:1 are unavailable, unclean or shutting down (mcg2: 0, > -1000000) > Feb 07 11:06:31 MCG1 pengine: [20534]: debug: native_assign_node: Could not > allocate a node for Tmgr:1 > Feb 07 11:06:31 MCG1 pengine: [20534]: info: native_color: Resource Tmgr:1 > cannot run anywhere > Feb 07 11:06:31 MCG1 pengine: [20534]: debug: clone_color: Allocated 1 > ms_Tmgr instances of a possible 2 > Feb 07 11:06:31 MCG1 pengine: [20534]: debug: master_color: Tmgr:0 master > score: 10 > Feb 07 11:06:31 MCG1 pengine: [20534]: info: master_color: Promoting Tmgr:0 > (Master mcg1) > Feb 07 11:06:31 MCG1 pengine: [20534]: debug: master_color: Tmgr:1 master > score: 0 > Feb 07 11:06:31 MCG1 pengine: [20534]: info: master_color: ms_Tmgr: Promoted > 1 instances of a possible 1 to master > Feb 07 11:06:31 MCG1 pengine: [20534]: debug: native_assign_node: Assigning > mcg1 to pimd:0 > Feb 07 11:06:31 MCG1 pengine: [20534]: debug: native_assign_node: All nodes > for resource pimd:1 are unavailable, unclean or shutting down (mcg2: 0, > -1000000) > Feb 07 11:06:31 MCG1 pengine: [20534]: debug: native_assign_node: Could not > allocate a node for pimd:1 > Feb 07 11:06:31 MCG1 pengine: [20534]: info: native_color: Resource pimd:1 > cannot run anywhere > Feb 07 11:06:31 MCG1 pengine: [20534]: debug: clone_color: Allocated 1 > ms_pimd instances of a possible 2 > Feb 07 11:06:31 MCG1 pengine: [20534]: debug: master_color: pimd:0 master > score: 10 > Feb 07 11:06:31 MCG1 pengine: [20534]: info: master_color: Promoting pimd:0 > (Master mcg1) > Feb 07 11:06:31 MCG1 pengine: [20534]: debug: master_color: pimd:1 master > score: 0 > Feb 07 11:06:31 MCG1 pengine: [20534]: info: master_color: ms_pimd: Promoted > 1 instances of a possible 1 to master > Feb 07 11:06:31 MCG1 pengine: [20534]: debug: native_assign_node: Assigning > mcg1 to ClusterIP > Feb 07 11:06:31 MCG1 pengine: [20534]: debug: native_assign_node: Assigning > mcg1 to EMS:0 > Feb 07 11:06:31 MCG1 pengine: [20534]: debug: native_assign_node: All nodes > for resource EMS:1 are unavailable, unclean or shutting down (mcg2: 0, > -1000000) > Feb 07 11:06:31 MCG1 pengine: [20534]: debug: native_assign_node: Could not > allocate a node for EMS:1 > Feb 07 11:06:31 MCG1 pengine: [20534]: info: native_color: Resource EMS:1 > cannot run anywhere > Feb 07 11:06:31 MCG1 pengine: [20534]: debug: clone_color: Allocated 1 > EMS_CLONE instances of a possible 2 > Feb 07 11:06:31 MCG1 pengine: [20534]: debug: native_assign_node: Assigning > mcg1 to NDB_VIP > Feb 07 11:06:31 MCG1 pengine: [20534]: debug: native_assign_node: Assigning > mcg1 to NDB_MGMT > Feb 07 11:06:31 MCG1 pengine: [20534]: debug: native_assign_node: Assigning > mcg1 to mysql:0 > Feb 07 11:06:31 MCG1 pengine: [20534]: debug: native_assign_node: All nodes > for resource mysql:1 are unavailable, unclean or shutting down (mcg2: 0, > -1000000) > Feb 07 11:06:31 MCG1 pengine: [20534]: debug: native_assign_node: Could not > allocate a node for mysql:1 > Feb 07 11:06:31 MCG1 pengine: [20534]: info: native_color: Resource mysql:1 > cannot run anywhere > Feb 07 11:06:31 MCG1 pengine: [20534]: debug: clone_color: Allocated 1 > mysqld_clone instances of a possible 2 > Feb 07 11:06:31 MCG1 pengine: [20534]: debug: native_assign_node: Assigning > mcg1 to ndbd:0 > Feb 07 11:06:31 MCG1 pengine: [20534]: debug: native_assign_node: All nodes > for resource ndbd:1 are unavailable, unclean or shutting down (mcg2: 0, > -1000000) > Feb 07 11:06:31 MCG1 pengine: [20534]: debug: native_assign_node: Could not > allocate a node for ndbd:1 > Feb 07 11:06:31 MCG1 pengine: [20534]: info: native_color: Resource ndbd:1 > cannot run anywhere > Feb 07 11:06:31 MCG1 pengine: [20534]: debug: clone_color: Allocated 1 > ndbdclone instances of a possible 2 > Feb 07 11:06:31 MCG1 pengine: [20534]: debug: master_create_actions: > Creating actions for ms_Rmgr > Feb 07 11:06:31 MCG1 pengine: [20534]: debug: master_create_actions: > Creating actions for ms_Tmgr > Feb 07 11:06:31 MCG1 pengine: [20534]: debug: master_create_actions: > Creating actions for ms_pimd > Feb 07 11:06:31 MCG1 pengine: [20534]: info: stage6: Scheduling Node mcg2 > for shutdown > Feb 07 11:06:31 MCG1 pengine: [20534]: debug: clone_rsc_order_lh: Pairing > Rmgr:0 with Tmgr:0 > Feb 07 11:06:31 MCG1 pengine: [20534]: debug: find_compatible_child: Can't > pair Tmgr:1 with ms_Rmgr > Feb 07 11:06:31 MCG1 pengine: [20534]: debug: clone_rsc_order_lh: No match > found for Tmgr:1 (0) > Feb 07 11:06:31 MCG1 pengine: [20534]: info: clone_rsc_order_lh: Inhibiting > Tmgr:1 from being active > Feb 07 11:06:31 MCG1 pengine: [20534]: debug: native_assign_node: Could not > allocate a node for Tmgr:1 > Feb 07 11:06:31 MCG1 pengine: [20534]: debug: clone_rsc_order_lh: Pairing > Tmgr:0 with Rmgr:0 > Feb 07 11:06:31 MCG1 pengine: [20534]: debug: clone_rsc_order_lh: Pairing > Tmgr:1 with Rmgr:1 > Feb 07 11:06:31 MCG1 pengine: [20534]: debug: clone_rsc_order_lh: Pairing > Tmgr:0 with pimd:0 > Feb 07 11:06:31 MCG1 pengine: [20534]: debug: find_compatible_child: Can't > pair pimd:1 with ms_Tmgr > Feb 07 11:06:31 MCG1 pengine: [20534]: debug: clone_rsc_order_lh: No match > found for pimd:1 (0) > Feb 07 11:06:31 MCG1 pengine: [20534]: info: clone_rsc_order_lh: Inhibiting > pimd:1 from being active > Feb 07 11:06:31 MCG1 pengine: [20534]: debug: native_assign_node: Could not > allocate a node for pimd:1 > Feb 07 11:06:31 MCG1 pengine: [20534]: debug: clone_rsc_order_lh: Pairing > pimd:0 with Tmgr:0 > Feb 07 11:06:31 MCG1 pengine: [20534]: debug: clone_rsc_order_lh: Pairing > pimd:1 with Tmgr:1 > Feb 07 11:06:31 MCG1 pengine: [20534]: debug: clone_rsc_order_lh: Pairing > Rmgr:0 with mysql:0 > Feb 07 11:06:31 MCG1 pengine: [20534]: debug: clone_rsc_order_lh: Pairing > Rmgr:1 with mysql:1 > Feb 07 11:06:31 MCG1 pengine: [20534]: notice: LogActions: Restart resource > Rmgr:0 (Master mcg1) > Feb 07 11:06:31 MCG1 pengine: [20534]: notice: LogActions: Stop resource > Rmgr:1 (mcg2) > Feb 07 11:06:31 MCG1 pengine: [20534]: notice: LogActions: Restart resource > Tmgr:0 (Master mcg1) > Feb 07 11:06:31 MCG1 pengine: [20534]: notice: LogActions: Stop resource > Tmgr:1 (mcg2) > Feb 07 11:06:31 MCG1 pengine: [20534]: notice: LogActions: Restart resource > pimd:0 (Master mcg1) > Feb 07 11:06:31 MCG1 pengine: [20534]: notice: LogActions: Stop resource > pimd:1 (mcg2) > Feb 07 11:06:31 MCG1 pengine: [20534]: notice: LogActions: Restart resource > ClusterIP (Started mcg1) > Feb 07 11:06:31 MCG1 pengine: [20534]: notice: LogActions: Leave resource > EMS:0 (Started mcg1) > Feb 07 11:06:31 MCG1 pengine: [20534]: notice: LogActions: Stop resource > EMS:1 (mcg2) > Feb 07 11:06:31 MCG1 pengine: [20534]: notice: LogActions: Leave resource > NDB_VIP (Started mcg1) > Feb 07 11:06:31 MCG1 pengine: [20534]: notice: LogActions: Leave resource > NDB_MGMT (Started mcg1) > Feb 07 11:06:31 MCG1 pengine: [20534]: notice: LogActions: Restart resource > mysql:0 (Started mcg1) > Feb 07 11:06:31 MCG1 pengine: [20534]: notice: LogActions: Stop resource > mysql:1 (mcg2) > Feb 07 11:06:31 MCG1 pengine: [20534]: notice: LogActions: Leave resource > ndbd:0 (Started mcg1) > Feb 07 11:06:31 MCG1 pengine: [20534]: notice: LogActions: Stop resource > ndbd:1 (mcg2) > " > Thanks in advance. > > Regards > Neha Chatrath > > _______________________________________________ > Pacemaker mailing list: Pacemaker@oss.clusterlabs.org > http://oss.clusterlabs.org/mailman/listinfo/pacemaker > > Project Home: http://www.clusterlabs.org > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > Bugs: http://bugs.clusterlabs.org > _______________________________________________ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org