On Fri, Sep 30, 2011 at 3:44 PM, Junko IKEDA <tsukishima...@gmail.com> wrote: > Hi, > > sorry for the confusion. > > Pacemaker 1.0.10 OK(group resource can failover) > Pacemaker 1.0.11 NG(gruop resource just stop, can not failover) > Pacemaker 1.1 <- the latest hg (gruop resource just stop, can not failover)
We've actually moved over 1.1 to git: http://www.clusterlabs.org/wiki/Contributing_Patches I should mark that somehow in the HG tree. > > By the way, your simulation showed dummy01 restart on bl460g1n13 again, > but dummy01 failed on bl460g1n13, so dummy01 should move to bl460g1n14. Hmmm. True. I'll take another look. > Current cluster status: > Online: [ bl460g1n13 bl460g1n14 ] > > Resource Group: grpDRBD > dummy01 (ocf::pacemaker:Dummy): Started bl460g1n13 FAILED > dummy02 (ocf::pacemaker:Dummy): Started bl460g1n13 > dummy03 (ocf::pacemaker:Dummy): Started bl460g1n13 > Master/Slave Set: msDRBD [prmDRBD] > Masters: [ bl460g1n13 ] > Slaves: [ bl460g1n14 ] > > Transition Summary: > crm_simulate[13781]: 2011/09/26_15:00:05 notice: LogActions: Recover > dummy01 (Started bl460g1n13) > crm_simulate[13781]: 2011/09/26_15:00:05 notice: LogActions: Restart > dummy02 (Started bl460g1n13) > crm_simulate[13781]: 2011/09/26_15:00:05 notice: LogActions: Restart > dummy03 (Started bl460g1n13) > crm_simulate[13781]: 2011/09/26_15:00:05 notice: LogActions: Leave > prmDRBD:0 (Master bl460g1n13) > crm_simulate[13781]: 2011/09/26_15:00:05 notice: LogActions: Leave > prmDRBD:1 (Slave bl460g1n14) > > Executing cluster transition: > * Executing action 14: dummy03_stop_0 on bl460g1n13 > * Executing action 12: dummy02_stop_0 on bl460g1n13 > * Executing action 2: dummy01_stop_0 on bl460g1n13 > * Executing action 11: dummy01_start_0 on bl460g1n13 > * Executing action 1: dummy01_monitor_10000 on bl460g1n13 > * Executing action 13: dummy02_start_0 on bl460g1n13 > * Executing action 3: dummy02_monitor_10000 on bl460g1n13 > * Executing action 15: dummy03_start_0 on bl460g1n13 > * Executing action 4: dummy03_monitor_10000 on bl460g1n13 > > > Thanks, > Junko > > > > 2011/9/29 Andrew Beekhof <and...@beekhof.net>: >> On Tue, Sep 27, 2011 at 2:31 PM, Junko IKEDA <tsukishima...@gmail.com> wrote: >>> Hi, >>> >>>> Which version did you check? >>> >>> Pacemaker 1.0.11. >> >> I meant of 1.1 since you said: >> >> "Pacemaker 1.1 shows the same behavior." >> >>> >>>> The latest from git seems to work fine: >>>> >>>> Current cluster status: >>>> Online: [ bl460g1n13 bl460g1n14 ] >>>> >>>> Resource Group: grpDRBD >>>> dummy01 (ocf::pacemaker:Dummy): Started bl460g1n13 FAILED >>>> dummy02 (ocf::pacemaker:Dummy): Started bl460g1n13 >>>> dummy03 (ocf::pacemaker:Dummy): Started bl460g1n13 >>>> Master/Slave Set: msDRBD [prmDRBD] >>>> Masters: [ bl460g1n13 ] >>>> Slaves: [ bl460g1n14 ] >>>> >>>> Transition Summary: >>>> crm_simulate[13781]: 2011/09/26_15:00:05 notice: LogActions: Recover >>>> dummy01 (Started bl460g1n13) >>>> crm_simulate[13781]: 2011/09/26_15:00:05 notice: LogActions: Restart >>>> dummy02 (Started bl460g1n13) >>>> crm_simulate[13781]: 2011/09/26_15:00:05 notice: LogActions: Restart >>>> dummy03 (Started bl460g1n13) >>>> crm_simulate[13781]: 2011/09/26_15:00:05 notice: LogActions: Leave >>>> prmDRBD:0 (Master bl460g1n13) >>>> crm_simulate[13781]: 2011/09/26_15:00:05 notice: LogActions: Leave >>>> prmDRBD:1 (Slave bl460g1n14) >>>> >>>> Executing cluster transition: >>>> * Executing action 14: dummy03_stop_0 on bl460g1n13 >>>> * Executing action 12: dummy02_stop_0 on bl460g1n13 >>>> * Executing action 2: dummy01_stop_0 on bl460g1n13 >>>> * Executing action 11: dummy01_start_0 on bl460g1n13 >>>> * Executing action 1: dummy01_monitor_10000 on bl460g1n13 >>>> * Executing action 13: dummy02_start_0 on bl460g1n13 >>>> * Executing action 3: dummy02_monitor_10000 on bl460g1n13 >>>> * Executing action 15: dummy03_start_0 on bl460g1n13 >>>> * Executing action 4: dummy03_monitor_10000 on bl460g1n13 >>> >>> dummy01 got the fail-count, >>> so dummy01 should move from bl460g1n13 to bl460g1n14. >>> Why does it re-start on the failure node? >>> >>> I got the latest changeset from hg; >>> >>> # hg log | head -n 7 >>> changeset: 15777:a15ead49e20f >>> branch: stable-1.0 >>> tag: tip >>> user: Andrew Beekhof <and...@beekhof.net> >>> date: Thu Aug 25 16:49:59 2011 +1000 >>> summary: changeset: 15775:fe18a1ad46f8 >>> >>> # crm >>> crm(live)# cib import pe-input-7.bz2 >>> crm(pe-input-7)# configure ptest vvv >>> ptest[19194]: 2011/09/27_11:53:45 notice: unpack_config: On loss of >>> CCM Quorum: Ignore >>> ptest[19194]: 2011/09/27_11:53:45 WARN: unpack_nodes: Blind faith: not >>> fencing unseen nodes >>> ptest[19194]: 2011/09/27_11:53:45 notice: group_print: Resource Group: >>> grpDRBD >>> ptest[19194]: 2011/09/27_11:53:45 notice: native_print: dummy01 >>> (ocf::pacemaker:Dummy): Started bl460g1n13 >>> ptest[19194]: 2011/09/27_11:53:45 notice: native_print: dummy02 >>> (ocf::pacemaker:Dummy): Started bl460g1n13 >>> ptest[19194]: 2011/09/27_11:53:45 notice: native_print: dummy03 >>> (ocf::pacemaker:Dummy): Started bl460g1n13 >>> ptest[19194]: 2011/09/27_11:53:45 notice: clone_print: Master/Slave Set: >>> msDRBD >>> ptest[19194]: 2011/09/27_11:53:45 notice: short_print: Masters: [ >>> bl460g1n13 ] >>> ptest[19194]: 2011/09/27_11:53:45 notice: short_print: Slaves: [ >>> bl460g1n14 ] >>> ptest[19194]: 2011/09/27_11:53:45 WARN: common_apply_stickiness: >>> Forcing dummy01 away from bl460g1n13 after 1 failures (max=1) >>> ptest[19194]: 2011/09/27_11:53:45 notice: LogActions: Stop resource >>> dummy01 (bl460g1n13) >>> ptest[19194]: 2011/09/27_11:53:45 notice: LogActions: Stop resource >>> dummy02 (bl460g1n13) >>> ptest[19194]: 2011/09/27_11:53:45 notice: LogActions: Stop resource >>> dummy03 (bl460g1n13) >>> ptest[19194]: 2011/09/27_11:53:45 notice: LogActions: Leave resource >>> prmDRBD:0 (Master bl460g1n13) >>> ptest[19194]: 2011/09/27_11:53:45 notice: LogActions: Leave resource >>> prmDRBD:1 (Slave bl460g1n14) >>> INFO: install graphviz to see a transition graph >>> crm(pe-input-7)# quit >>> >>> >>> reverts to Pacemaker 1.0.11, >>> >>> # hg revert -a -r b2e39d318fda >>> # make install >>> >>> # crm >>> crm(live)# cib import pe-input-7.bz2 >>> crm(pe-input-7)# configure ptest vvv >>> ptest[751]: 2011/09/27_11:57:50 notice: unpack_config: On loss of CCM >>> Quorum: Ignore >>> ptest[751]: 2011/09/27_11:57:50 WARN: unpack_nodes: Blind faith: not >>> fencing unseen nodes >>> ptest[751]: 2011/09/27_11:57:50 notice: group_print: Resource Group: >>> grpDRBD >>> ptest[751]: 2011/09/27_11:57:50 notice: native_print: dummy01 >>> (ocf::pacemaker:Dummy): Started bl460g1n13 >>> ptest[751]: 2011/09/27_11:57:50 notice: native_print: dummy02 >>> (ocf::pacemaker:Dummy): Started bl460g1n13 >>> ptest[751]: 2011/09/27_11:57:50 notice: native_print: dummy03 >>> (ocf::pacemaker:Dummy): Started bl460g1n13 >>> ptest[751]: 2011/09/27_11:57:50 notice: clone_print: Master/Slave Set: >>> msDRBD >>> ptest[751]: 2011/09/27_11:57:50 notice: short_print: Masters: [ >>> bl460g1n13 ] >>> ptest[751]: 2011/09/27_11:57:50 notice: short_print: Slaves: [ >>> bl460g1n14 ] >>> ptest[751]: 2011/09/27_11:57:50 WARN: common_apply_stickiness: Forcing >>> dummy01 away from bl460g1n13 after 1 failures (max=1) >>> ptest[751]: 2011/09/27_11:57:50 notice: RecurringOp: Start recurring >>> monitor (10s) for dummy01 on bl460g1n14 >>> ptest[751]: 2011/09/27_11:57:50 notice: RecurringOp: Start recurring >>> monitor (10s) for dummy02 on bl460g1n14 >>> ptest[751]: 2011/09/27_11:57:50 notice: RecurringOp: Start recurring >>> monitor (10s) for dummy03 on bl460g1n14 >>> ptest[751]: 2011/09/27_11:57:50 notice: RecurringOp: Start recurring >>> monitor (20s) for prmDRBD:0 on bl460g1n13 >>> ptest[751]: 2011/09/27_11:57:50 notice: RecurringOp: Start recurring >>> monitor (10s) for prmDRBD:1 on bl460g1n14 >>> ptest[751]: 2011/09/27_11:57:50 notice: RecurringOp: Start recurring >>> monitor (20s) for prmDRBD:0 on bl460g1n13 >>> ptest[751]: 2011/09/27_11:57:50 notice: RecurringOp: Start recurring >>> monitor (10s) for prmDRBD:1 on bl460g1n14 >>> ptest[751]: 2011/09/27_11:57:50 notice: LogActions: Move resource >>> dummy01 (Started bl460g1n13 -> bl460g1n14) >>> ptest[751]: 2011/09/27_11:57:50 notice: LogActions: Move resource >>> dummy02 (Started bl460g1n13 -> bl460g1n14) >>> ptest[751]: 2011/09/27_11:57:50 notice: LogActions: Move resource >>> dummy03 (Started bl460g1n13 -> bl460g1n14) >>> ptest[751]: 2011/09/27_11:57:50 notice: LogActions: Demote prmDRBD:0 >>> (Master -> Slave bl460g1n13) >>> ptest[751]: 2011/09/27_11:57:50 notice: LogActions: Promote prmDRBD:1 >>> (Slave -> Master bl460g1n14) >>> INFO: install graphviz to see a transition graph >>> >>> Pacemaker 1.0.10 moved the failure resource to the other node. >>> It's the expected behavior. >>> >>> I attached the hb_report which includes the above pe-input-7.bz2. >>> >>> Thanks, >>> Junko >>> >>> _______________________________________________ >>> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org >>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker >>> >>> Project Home: http://www.clusterlabs.org >>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf >>> Bugs: >>> http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker >>> >>> >> >> _______________________________________________ >> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org >> http://oss.clusterlabs.org/mailman/listinfo/pacemaker >> >> Project Home: http://www.clusterlabs.org >> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf >> Bugs: >> http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker >> > > _______________________________________________ > Pacemaker mailing list: Pacemaker@oss.clusterlabs.org > http://oss.clusterlabs.org/mailman/listinfo/pacemaker > > Project Home: http://www.clusterlabs.org > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > Bugs: > http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker > _______________________________________________ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker