On Tue, Sep 27, 2011 at 2:31 PM, Junko IKEDA <tsukishima...@gmail.com> wrote: > Hi, > >> Which version did you check? > > Pacemaker 1.0.11.
I meant of 1.1 since you said: "Pacemaker 1.1 shows the same behavior." > >> The latest from git seems to work fine: >> >> Current cluster status: >> Online: [ bl460g1n13 bl460g1n14 ] >> >> Resource Group: grpDRBD >> dummy01 (ocf::pacemaker:Dummy): Started bl460g1n13 FAILED >> dummy02 (ocf::pacemaker:Dummy): Started bl460g1n13 >> dummy03 (ocf::pacemaker:Dummy): Started bl460g1n13 >> Master/Slave Set: msDRBD [prmDRBD] >> Masters: [ bl460g1n13 ] >> Slaves: [ bl460g1n14 ] >> >> Transition Summary: >> crm_simulate[13781]: 2011/09/26_15:00:05 notice: LogActions: Recover >> dummy01 (Started bl460g1n13) >> crm_simulate[13781]: 2011/09/26_15:00:05 notice: LogActions: Restart >> dummy02 (Started bl460g1n13) >> crm_simulate[13781]: 2011/09/26_15:00:05 notice: LogActions: Restart >> dummy03 (Started bl460g1n13) >> crm_simulate[13781]: 2011/09/26_15:00:05 notice: LogActions: Leave >> prmDRBD:0 (Master bl460g1n13) >> crm_simulate[13781]: 2011/09/26_15:00:05 notice: LogActions: Leave >> prmDRBD:1 (Slave bl460g1n14) >> >> Executing cluster transition: >> * Executing action 14: dummy03_stop_0 on bl460g1n13 >> * Executing action 12: dummy02_stop_0 on bl460g1n13 >> * Executing action 2: dummy01_stop_0 on bl460g1n13 >> * Executing action 11: dummy01_start_0 on bl460g1n13 >> * Executing action 1: dummy01_monitor_10000 on bl460g1n13 >> * Executing action 13: dummy02_start_0 on bl460g1n13 >> * Executing action 3: dummy02_monitor_10000 on bl460g1n13 >> * Executing action 15: dummy03_start_0 on bl460g1n13 >> * Executing action 4: dummy03_monitor_10000 on bl460g1n13 > > dummy01 got the fail-count, > so dummy01 should move from bl460g1n13 to bl460g1n14. > Why does it re-start on the failure node? > > I got the latest changeset from hg; > > # hg log | head -n 7 > changeset: 15777:a15ead49e20f > branch: stable-1.0 > tag: tip > user: Andrew Beekhof <and...@beekhof.net> > date: Thu Aug 25 16:49:59 2011 +1000 > summary: changeset: 15775:fe18a1ad46f8 > > # crm > crm(live)# cib import pe-input-7.bz2 > crm(pe-input-7)# configure ptest vvv > ptest[19194]: 2011/09/27_11:53:45 notice: unpack_config: On loss of > CCM Quorum: Ignore > ptest[19194]: 2011/09/27_11:53:45 WARN: unpack_nodes: Blind faith: not > fencing unseen nodes > ptest[19194]: 2011/09/27_11:53:45 notice: group_print: Resource Group: > grpDRBD > ptest[19194]: 2011/09/27_11:53:45 notice: native_print: dummy01 > (ocf::pacemaker:Dummy): Started bl460g1n13 > ptest[19194]: 2011/09/27_11:53:45 notice: native_print: dummy02 > (ocf::pacemaker:Dummy): Started bl460g1n13 > ptest[19194]: 2011/09/27_11:53:45 notice: native_print: dummy03 > (ocf::pacemaker:Dummy): Started bl460g1n13 > ptest[19194]: 2011/09/27_11:53:45 notice: clone_print: Master/Slave Set: > msDRBD > ptest[19194]: 2011/09/27_11:53:45 notice: short_print: Masters: [ > bl460g1n13 ] > ptest[19194]: 2011/09/27_11:53:45 notice: short_print: Slaves: [ > bl460g1n14 ] > ptest[19194]: 2011/09/27_11:53:45 WARN: common_apply_stickiness: > Forcing dummy01 away from bl460g1n13 after 1 failures (max=1) > ptest[19194]: 2011/09/27_11:53:45 notice: LogActions: Stop resource > dummy01 (bl460g1n13) > ptest[19194]: 2011/09/27_11:53:45 notice: LogActions: Stop resource > dummy02 (bl460g1n13) > ptest[19194]: 2011/09/27_11:53:45 notice: LogActions: Stop resource > dummy03 (bl460g1n13) > ptest[19194]: 2011/09/27_11:53:45 notice: LogActions: Leave resource > prmDRBD:0 (Master bl460g1n13) > ptest[19194]: 2011/09/27_11:53:45 notice: LogActions: Leave resource > prmDRBD:1 (Slave bl460g1n14) > INFO: install graphviz to see a transition graph > crm(pe-input-7)# quit > > > reverts to Pacemaker 1.0.11, > > # hg revert -a -r b2e39d318fda > # make install > > # crm > crm(live)# cib import pe-input-7.bz2 > crm(pe-input-7)# configure ptest vvv > ptest[751]: 2011/09/27_11:57:50 notice: unpack_config: On loss of CCM > Quorum: Ignore > ptest[751]: 2011/09/27_11:57:50 WARN: unpack_nodes: Blind faith: not > fencing unseen nodes > ptest[751]: 2011/09/27_11:57:50 notice: group_print: Resource Group: grpDRBD > ptest[751]: 2011/09/27_11:57:50 notice: native_print: dummy01 > (ocf::pacemaker:Dummy): Started bl460g1n13 > ptest[751]: 2011/09/27_11:57:50 notice: native_print: dummy02 > (ocf::pacemaker:Dummy): Started bl460g1n13 > ptest[751]: 2011/09/27_11:57:50 notice: native_print: dummy03 > (ocf::pacemaker:Dummy): Started bl460g1n13 > ptest[751]: 2011/09/27_11:57:50 notice: clone_print: Master/Slave Set: msDRBD > ptest[751]: 2011/09/27_11:57:50 notice: short_print: Masters: [ > bl460g1n13 ] > ptest[751]: 2011/09/27_11:57:50 notice: short_print: Slaves: [ > bl460g1n14 ] > ptest[751]: 2011/09/27_11:57:50 WARN: common_apply_stickiness: Forcing > dummy01 away from bl460g1n13 after 1 failures (max=1) > ptest[751]: 2011/09/27_11:57:50 notice: RecurringOp: Start recurring > monitor (10s) for dummy01 on bl460g1n14 > ptest[751]: 2011/09/27_11:57:50 notice: RecurringOp: Start recurring > monitor (10s) for dummy02 on bl460g1n14 > ptest[751]: 2011/09/27_11:57:50 notice: RecurringOp: Start recurring > monitor (10s) for dummy03 on bl460g1n14 > ptest[751]: 2011/09/27_11:57:50 notice: RecurringOp: Start recurring > monitor (20s) for prmDRBD:0 on bl460g1n13 > ptest[751]: 2011/09/27_11:57:50 notice: RecurringOp: Start recurring > monitor (10s) for prmDRBD:1 on bl460g1n14 > ptest[751]: 2011/09/27_11:57:50 notice: RecurringOp: Start recurring > monitor (20s) for prmDRBD:0 on bl460g1n13 > ptest[751]: 2011/09/27_11:57:50 notice: RecurringOp: Start recurring > monitor (10s) for prmDRBD:1 on bl460g1n14 > ptest[751]: 2011/09/27_11:57:50 notice: LogActions: Move resource > dummy01 (Started bl460g1n13 -> bl460g1n14) > ptest[751]: 2011/09/27_11:57:50 notice: LogActions: Move resource > dummy02 (Started bl460g1n13 -> bl460g1n14) > ptest[751]: 2011/09/27_11:57:50 notice: LogActions: Move resource > dummy03 (Started bl460g1n13 -> bl460g1n14) > ptest[751]: 2011/09/27_11:57:50 notice: LogActions: Demote prmDRBD:0 > (Master -> Slave bl460g1n13) > ptest[751]: 2011/09/27_11:57:50 notice: LogActions: Promote prmDRBD:1 > (Slave -> Master bl460g1n14) > INFO: install graphviz to see a transition graph > > Pacemaker 1.0.10 moved the failure resource to the other node. > It's the expected behavior. > > I attached the hb_report which includes the above pe-input-7.bz2. > > Thanks, > Junko > > _______________________________________________ > Pacemaker mailing list: Pacemaker@oss.clusterlabs.org > http://oss.clusterlabs.org/mailman/listinfo/pacemaker > > Project Home: http://www.clusterlabs.org > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > Bugs: > http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker > > _______________________________________________ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker