Hi All, We tested the trouble of the clone resource in the next procedure.
Step1) We start a cluster in three nodes. ============ Last updated: Thu Mar 31 10:01:47 2011 Stack: Heartbeat Current DC: srv03 (e2ffc1ed-3ebe-47e2-b51b-b0f04b454311) - partition with quorum Version: 1.0.10-9342a4147fc69f2081f8563a34509da5be0a89d0 3 Nodes configured, unknown expected votes 4 Resources configured. ============ Node srv01 (45f985d7-e7c8-4834-b01b-16b99526672b): online main_rsc (ocf::pacemaker:Dummy) Started prmDummy1:0 (ocf::pacemaker:Dummy) Started prmPingd:0 (ocf::pacemaker:ping) Started Node srv02 (ed7fdcbf-9c17-4f31-8a27-a831a6b39ed5): online prmDummy1:1 (ocf::pacemaker:Dummy) Started main_rsc2 (ocf::pacemaker:Dummy) Started prmPingd:1 (ocf::pacemaker:ping) Started Node srv03 (e2ffc1ed-3ebe-47e2-b51b-b0f04b454311): online prmDummy1:2 (ocf::pacemaker:Dummy) Started prmPingd:2 (ocf::pacemaker:ping) Started Inactive resources: Migration summary: * Node srv01: pingd=1 * Node srv03: pingd=1 * Node srv02: pingd=1 Step2) In a srv01 node, We generate the trouble of the clone resource. [root@srv01 ~]# rm -rf /var/run/Dummy-prmDummy1.state Step3) In a srv02 node, it takes the reboot of the pingd clone. Under influence of this, rebooting, main_rsc2 reboots. * The number of the clone becomes funny somehow or other, too. [root@srv02 ~]# tail -f /var/log/ha-log | grep stop Mar 31 10:02:22 srv02 crmd: [24471]: info: do_lrm_rsc_op: Performing key=29:4:0:6c32b0f8-d37a-4ebc-8365-30e2e02ba9d3 op=prmPingd:1_stop_0 ) Mar 31 10:02:25 srv02 lrmd: [24468]: info: rsc:prmPingd:1:12: stop Mar 31 10:02:25 srv02 crmd: [24471]: info: process_lrm_event: LRM operation prmPingd:1_stop_0 (call=12, rc=0, cib-update=21, confirmed=true) ok Mar 31 10:02:33 srv02 crmd: [24471]: info: do_lrm_rsc_op: Performing key=9:5:0:6c32b0f8-d37a-4ebc-8365-30e2e02ba9d3 op=main_rsc2_stop_0 ) Mar 31 10:02:33 srv02 lrmd: [24468]: info: rsc:main_rsc2:14: stop Mar 31 10:02:33 srv02 crmd: [24471]: info: process_lrm_event: LRM operation main_rsc2_stop_0 (call=14, rc=0, cib-update=23, confirmed=true) ok ============ Last updated: Thu Mar 31 10:02:40 2011 Stack: Heartbeat Current DC: srv03 (e2ffc1ed-3ebe-47e2-b51b-b0f04b454311) - partition with quorum Version: 1.0.10-9342a4147fc69f2081f8563a34509da5be0a89d0 3 Nodes configured, unknown expected votes 4 Resources configured. ============ Node srv01 (45f985d7-e7c8-4834-b01b-16b99526672b): online Node srv02 (ed7fdcbf-9c17-4f31-8a27-a831a6b39ed5): online prmDummy1:1 (ocf::pacemaker:Dummy) Started ---------> :1(funny) prmPingd:0 (ocf::pacemaker:ping) Started ---------> :0(funny) Node srv03 (e2ffc1ed-3ebe-47e2-b51b-b0f04b454311): online main_rsc (ocf::pacemaker:Dummy) Started prmDummy1:2 (ocf::pacemaker:Dummy) Started ---------> :2(funny) prmPingd:1 (ocf::pacemaker:ping) Started ---------> :1(funny) Inactive resources: main_rsc2 (ocf::pacemaker:Dummy): Stopped Clone Set: clnDummy1 Started: [ srv02 srv03 ] Stopped: [ prmDummy1:0 ] Clone Set: clnPingd Started: [ srv02 srv03 ] Stopped: [ prmPingd:2 ] Migration summary: * Node srv01: prmDummy1:0: migration-threshold=1 fail-count=1 * Node srv03: pingd=1 * Node srv02: pingd=1 Failed actions: prmDummy1:0_monitor_10000 (node=srv01, call=8, rc=7, status=complete): not running We think the reboot of pingd to be unnecessary in a srv02 node. Is there the method how this problem is settled? Possibly the next bug may be related. * http://developerbugs.linux-foundation.org/show_bug.cgi?id=2508 I registered the log with Bugzilla.(attached hb_report) * http://developerbugs.linux-foundation.org/show_bug.cgi?id=2574 Best Regards, Hideo Yamauchi. _______________________________________________ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker