2010/3/16 Junko IKEDA <ike...@intellilink.co.jp>: > Hi, > > There is just a little strange clone behavior. > I found that; > > (1) start the group which contains three primitive resources, > and clone set > > # crm_mon -1 > > ============ > Last updated: Tue Mar 16 21:39:10 2010 > Stack: openais > Current DC: cspm01 - partition with quorum > Version: 1.0.8-a77303a7adce stable-1.0 tip > 4 Nodes configured, 4 expected votes > 2 Resources configured. > ============ > > Online: [ cspm01 cspm02 cspm03 cspm04 ] > > Resource Group: UMgroup01 > UmDummy01 (ocf::heartbeat:Dummy): Started cspm01 > UmDummy02 (ocf::heartbeat:Dummy): Started cspm01 > UmDummy03 (ocf::heartbeat:Dummy): Started cspm01 > Clone Set: clnUMgroup01 > Started: [ cspm01 cspm04 ] > > (2) edit Dummy RA to create clnUMgroup01 stop NG. > > # vim /usr/lib/ocf/resource.d/heartbeat/Dummy01 > ----------------------------------------------- > dummy_stop() { > exit $OCF_ERR_GENERIC # intentional error > > dummy_monitor > if [ $? = $OCF_SUCCESS ]; then > rm ${OCF_RESKEY_state} > fi > return $OCF_SUCCESS > } > ----------------------------------------------- > > (on cspm01) > # rm -f /var/run/heartbeat/rsctmp/Dummy-clnUMdummy01:0.state > > (3) check the status of each resources > > # crm_mon -1 > > ============ > Last updated: Tue Mar 16 21:40:11 2010 > Stack: openais > Current DC: cspm01 - partition with quorum > Version: 1.0.8-a77303a7adce stable-1.0 tip > 4 Nodes configured, 4 expected votes > 2 Resources configured. > ============ > > Online: [ cspm01 cspm02 cspm03 cspm04 ] > > Clone Set: clnUMgroup01 > Resource Group: clnUmResource:0 > clnUMdummy01:0 (ocf::heartbeat:Dummy01): Started cspm01 > (unmanaged) FAILED > clnUMdummy02:0 (ocf::heartbeat:Dummy02): Stopped > Started: [ cspm04 ] > > Failed actions: > clnUMdummy01:0_monitor_10000 (node=cspm01, call=8, rc=7, > status=complete): not running > clnUMdummy01:0_stop_0 (node=cspm01, call=18, rc=1, > status=complete): > unknown error > UmDummy03_monitor_10000 (node=cspm01, call=16, rc=7, > status=complete): > not running > UmDummy01_monitor_10000 (node=cspm01, call=12, rc=7, > status=complete): > not running > clnUMdummy02:0_monitor_10000 (node=cspm01, call=10, rc=7, > status=complete): not running > > > In this case, clone instance on cspm04 keeps running.
Which makes sense. It has't failed, there's no reason to stop it. > > but when I added the other resource in group, like this; > > ============ > Last updated: Tue Mar 16 21:53:26 2010 > Stack: openais > Current DC: cspm01 - partition with quorum > Version: 1.0.8-a77303a7adce stable-1.0 tip > 4 Nodes configured, 4 expected votes > 2 Resources configured. > ============ > > Online: [ cspm01 cspm02 cspm03 cspm04 ] > > Resource Group: UMgroup01 > UmDummy01 (ocf::heartbeat:Dummy): Started cspm01 > UmDummy02 (ocf::heartbeat:Dummy): Started cspm01 > UmDummy03 (ocf::heartbeat:Dummy): Started cspm01 > UmDummy04 (ocf::heartbeat:Dummy): Started cspm01 > Clone Set: clnUMgroup01 > Started: [ cspm01 cspm04 ] > > > after the same error as the above, > the result of crm_mon was strange. > > ============ > Last updated: Tue Mar 16 21:54:46 2010 > Stack: openais > Current DC: cspm01 - partition with quorum > Version: 1.0.8-a77303a7adce stable-1.0 tip > 4 Nodes configured, 4 expected votes > 2 Resources configured. > ============ > > Online: [ cspm01 cspm02 cspm03 cspm04 ] > > Clone Set: clnUMgroup01 > Resource Group: clnUmResource:0 > clnUMdummy01:0 (ocf::heartbeat:Dummy01): Started cspm01 > (unmanaged) FAILED > clnUMdummy02:0 (ocf::heartbeat:Dummy02): Stopped > Stopped: [ clnUmResource:1 ] > > Failed actions: > clnUMdummy01:0_monitor_10000 (node=cspm01, call=9, rc=7, > status=complete): not running > clnUMdummy01:0_stop_0 (node=cspm01, call=21, rc=1, > status=complete): > unknown error > > > In this case, clone instance on cspm04 was stopped. > I didn't change the rsc_colocation or order setting. > Which case is the expected? The first. You could be seeing a bug thats already fixed though. With 1.0.8 it wants to start the clone: [11:22 AM] beek...@mobile ~/Development/pacemaker/stable-1.0 # pengine/ptest -VVV -x /Users/beekhof/Downloads/Dummy_x4/cspm01/pengine/pe-warn-2.bz2 ptest[21686]: 2010/03/18_11:22:08 WARN: unpack_nodes: Blind faith: not fencing unseen nodes ptest[21686]: 2010/03/18_11:22:08 WARN: unpack_rsc_op: Processing failed op clnUMdummy01:0_monitor_10000 on cspm01: not running (7) ptest[21686]: 2010/03/18_11:22:08 WARN: unpack_rsc_op: Processing failed op clnUMdummy01:0_stop_0 on cspm01: unknown error (1) ptest[21686]: 2010/03/18_11:22:08 notice: group_print: Resource Group: UMgroup01 ptest[21686]: 2010/03/18_11:22:08 notice: native_print: UmDummy01 (ocf::heartbeat:Dummy): Started cspm01 ptest[21686]: 2010/03/18_11:22:08 notice: native_print: UmDummy02 (ocf::heartbeat:Dummy): Started cspm01 ptest[21686]: 2010/03/18_11:22:08 notice: native_print: UmDummy03 (ocf::heartbeat:Dummy): Started cspm01 ptest[21686]: 2010/03/18_11:22:08 notice: native_print: UmDummy04 (ocf::heartbeat:Dummy): Started cspm01 ptest[21686]: 2010/03/18_11:22:08 notice: clone_print: Clone Set: clnUMgroup01 ptest[21686]: 2010/03/18_11:22:08 notice: group_print: Resource Group: clnUmResource:0 ptest[21686]: 2010/03/18_11:22:08 notice: native_print: clnUMdummy01:0 (ocf::heartbeat:Dummy01): Started cspm01 (unmanaged) FAILED ptest[21686]: 2010/03/18_11:22:08 notice: native_print: clnUMdummy02:0 (ocf::heartbeat:Dummy02): Stopped ptest[21686]: 2010/03/18_11:22:08 notice: short_print: Stopped: [ clnUmResource:1 ] ptest[21686]: 2010/03/18_11:22:08 WARN: common_apply_stickiness: Forcing clnUMgroup01 away from cspm01 after 1000000 failures (max=10) ptest[21686]: 2010/03/18_11:22:08 notice: RecurringOp: Start recurring monitor (10s) for UmDummy01 on cspm04 ptest[21686]: 2010/03/18_11:22:08 notice: RecurringOp: Start recurring monitor (10s) for UmDummy02 on cspm04 ptest[21686]: 2010/03/18_11:22:08 notice: RecurringOp: Start recurring monitor (10s) for UmDummy03 on cspm04 ptest[21686]: 2010/03/18_11:22:08 notice: RecurringOp: Start recurring monitor (10s) for UmDummy04 on cspm04 ptest[21686]: 2010/03/18_11:22:08 notice: RecurringOp: Start recurring monitor (10s) for clnUMdummy01:1 on cspm04 ptest[21686]: 2010/03/18_11:22:08 notice: RecurringOp: Start recurring monitor (10s) for clnUMdummy02:1 on cspm04 ptest[21686]: 2010/03/18_11:22:08 notice: LogActions: Move resource UmDummy01 (Started cspm01 -> cspm04) ptest[21686]: 2010/03/18_11:22:08 notice: LogActions: Move resource UmDummy02 (Started cspm01 -> cspm04) ptest[21686]: 2010/03/18_11:22:08 notice: LogActions: Move resource UmDummy03 (Started cspm01 -> cspm04) ptest[21686]: 2010/03/18_11:22:08 notice: LogActions: Move resource UmDummy04 (Started cspm01 -> cspm04) ptest[21686]: 2010/03/18_11:22:08 notice: LogActions: Leave resource clnUMdummy01:0 (Started unmanaged) ptest[21686]: 2010/03/18_11:22:08 notice: LogActions: Leave resource clnUMdummy02:0 (Stopped) ptest[21686]: 2010/03/18_11:22:08 notice: LogActions: Start clnUMdummy01:1 (cspm04) ptest[21686]: 2010/03/18_11:22:08 notice: LogActions: Start clnUMdummy02:1 (cspm04) _______________________________________________ Pacemaker mailing list Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker