I tried to look at this one (finally!) but the two PE files i need (pe-input-7.bz2 and pe-input-8.bz2 from cspm01) are missing. Very strange.
2010/3/18 Andrew Beekhof <and...@beekhof.net>: > 2010/3/16 Junko IKEDA <ike...@intellilink.co.jp>: >> Hi, >> >> There is just a little strange clone behavior. >> I found that; >> >> (1) start the group which contains three primitive resources, >> and clone set >> >> # crm_mon -1 >> >> ============ >> Last updated: Tue Mar 16 21:39:10 2010 >> Stack: openais >> Current DC: cspm01 - partition with quorum >> Version: 1.0.8-a77303a7adce stable-1.0 tip >> 4 Nodes configured, 4 expected votes >> 2 Resources configured. >> ============ >> >> Online: [ cspm01 cspm02 cspm03 cspm04 ] >> >> Resource Group: UMgroup01 >> UmDummy01 (ocf::heartbeat:Dummy): Started cspm01 >> UmDummy02 (ocf::heartbeat:Dummy): Started cspm01 >> UmDummy03 (ocf::heartbeat:Dummy): Started cspm01 >> Clone Set: clnUMgroup01 >> Started: [ cspm01 cspm04 ] >> >> (2) edit Dummy RA to create clnUMgroup01 stop NG. >> >> # vim /usr/lib/ocf/resource.d/heartbeat/Dummy01 >> ----------------------------------------------- >> dummy_stop() { >> exit $OCF_ERR_GENERIC # intentional error >> >> dummy_monitor >> if [ $? = $OCF_SUCCESS ]; then >> rm ${OCF_RESKEY_state} >> fi >> return $OCF_SUCCESS >> } >> ----------------------------------------------- >> >> (on cspm01) >> # rm -f /var/run/heartbeat/rsctmp/Dummy-clnUMdummy01:0.state >> >> (3) check the status of each resources >> >> # crm_mon -1 >> >> ============ >> Last updated: Tue Mar 16 21:40:11 2010 >> Stack: openais >> Current DC: cspm01 - partition with quorum >> Version: 1.0.8-a77303a7adce stable-1.0 tip >> 4 Nodes configured, 4 expected votes >> 2 Resources configured. >> ============ >> >> Online: [ cspm01 cspm02 cspm03 cspm04 ] >> >> Clone Set: clnUMgroup01 >> Resource Group: clnUmResource:0 >> clnUMdummy01:0 (ocf::heartbeat:Dummy01): Started cspm01 >> (unmanaged) FAILED >> clnUMdummy02:0 (ocf::heartbeat:Dummy02): Stopped >> Started: [ cspm04 ] >> >> Failed actions: >> clnUMdummy01:0_monitor_10000 (node=cspm01, call=8, rc=7, >> status=complete): not running >> clnUMdummy01:0_stop_0 (node=cspm01, call=18, rc=1, >> status=complete): >> unknown error >> UmDummy03_monitor_10000 (node=cspm01, call=16, rc=7, >> status=complete): >> not running >> UmDummy01_monitor_10000 (node=cspm01, call=12, rc=7, >> status=complete): >> not running >> clnUMdummy02:0_monitor_10000 (node=cspm01, call=10, rc=7, >> status=complete): not running >> >> >> In this case, clone instance on cspm04 keeps running. > > Which makes sense. It has't failed, there's no reason to stop it. > >> >> but when I added the other resource in group, like this; >> >> ============ >> Last updated: Tue Mar 16 21:53:26 2010 >> Stack: openais >> Current DC: cspm01 - partition with quorum >> Version: 1.0.8-a77303a7adce stable-1.0 tip >> 4 Nodes configured, 4 expected votes >> 2 Resources configured. >> ============ >> >> Online: [ cspm01 cspm02 cspm03 cspm04 ] >> >> Resource Group: UMgroup01 >> UmDummy01 (ocf::heartbeat:Dummy): Started cspm01 >> UmDummy02 (ocf::heartbeat:Dummy): Started cspm01 >> UmDummy03 (ocf::heartbeat:Dummy): Started cspm01 >> UmDummy04 (ocf::heartbeat:Dummy): Started cspm01 >> Clone Set: clnUMgroup01 >> Started: [ cspm01 cspm04 ] >> >> >> after the same error as the above, >> the result of crm_mon was strange. >> >> ============ >> Last updated: Tue Mar 16 21:54:46 2010 >> Stack: openais >> Current DC: cspm01 - partition with quorum >> Version: 1.0.8-a77303a7adce stable-1.0 tip >> 4 Nodes configured, 4 expected votes >> 2 Resources configured. >> ============ >> >> Online: [ cspm01 cspm02 cspm03 cspm04 ] >> >> Clone Set: clnUMgroup01 >> Resource Group: clnUmResource:0 >> clnUMdummy01:0 (ocf::heartbeat:Dummy01): Started cspm01 >> (unmanaged) FAILED >> clnUMdummy02:0 (ocf::heartbeat:Dummy02): Stopped >> Stopped: [ clnUmResource:1 ] >> >> Failed actions: >> clnUMdummy01:0_monitor_10000 (node=cspm01, call=9, rc=7, >> status=complete): not running >> clnUMdummy01:0_stop_0 (node=cspm01, call=21, rc=1, >> status=complete): >> unknown error >> >> >> In this case, clone instance on cspm04 was stopped. >> I didn't change the rsc_colocation or order setting. >> Which case is the expected? > > The first. You could be seeing a bug thats already fixed though. > With 1.0.8 it wants to start the clone: > > [11:22 AM] beek...@mobile ~/Development/pacemaker/stable-1.0 # > pengine/ptest -VVV -x > /Users/beekhof/Downloads/Dummy_x4/cspm01/pengine/pe-warn-2.bz2 > ptest[21686]: 2010/03/18_11:22:08 WARN: unpack_nodes: Blind faith: not > fencing unseen nodes > ptest[21686]: 2010/03/18_11:22:08 WARN: unpack_rsc_op: Processing > failed op clnUMdummy01:0_monitor_10000 on cspm01: not running (7) > ptest[21686]: 2010/03/18_11:22:08 WARN: unpack_rsc_op: Processing > failed op clnUMdummy01:0_stop_0 on cspm01: unknown error (1) > ptest[21686]: 2010/03/18_11:22:08 notice: group_print: Resource > Group: UMgroup01 > ptest[21686]: 2010/03/18_11:22:08 notice: native_print: > UmDummy01 (ocf::heartbeat:Dummy): Started cspm01 > ptest[21686]: 2010/03/18_11:22:08 notice: native_print: > UmDummy02 (ocf::heartbeat:Dummy): Started cspm01 > ptest[21686]: 2010/03/18_11:22:08 notice: native_print: > UmDummy03 (ocf::heartbeat:Dummy): Started cspm01 > ptest[21686]: 2010/03/18_11:22:08 notice: native_print: > UmDummy04 (ocf::heartbeat:Dummy): Started cspm01 > ptest[21686]: 2010/03/18_11:22:08 notice: clone_print: Clone Set: > clnUMgroup01 > ptest[21686]: 2010/03/18_11:22:08 notice: group_print: Resource > Group: clnUmResource:0 > ptest[21686]: 2010/03/18_11:22:08 notice: native_print: > clnUMdummy01:0 (ocf::heartbeat:Dummy01): Started cspm01 (unmanaged) > FAILED > ptest[21686]: 2010/03/18_11:22:08 notice: native_print: > clnUMdummy02:0 (ocf::heartbeat:Dummy02): Stopped > ptest[21686]: 2010/03/18_11:22:08 notice: short_print: Stopped: [ > clnUmResource:1 ] > ptest[21686]: 2010/03/18_11:22:08 WARN: common_apply_stickiness: > Forcing clnUMgroup01 away from cspm01 after 1000000 failures (max=10) > ptest[21686]: 2010/03/18_11:22:08 notice: RecurringOp: Start > recurring monitor (10s) for UmDummy01 on cspm04 > ptest[21686]: 2010/03/18_11:22:08 notice: RecurringOp: Start > recurring monitor (10s) for UmDummy02 on cspm04 > ptest[21686]: 2010/03/18_11:22:08 notice: RecurringOp: Start > recurring monitor (10s) for UmDummy03 on cspm04 > ptest[21686]: 2010/03/18_11:22:08 notice: RecurringOp: Start > recurring monitor (10s) for UmDummy04 on cspm04 > ptest[21686]: 2010/03/18_11:22:08 notice: RecurringOp: Start > recurring monitor (10s) for clnUMdummy01:1 on cspm04 > ptest[21686]: 2010/03/18_11:22:08 notice: RecurringOp: Start > recurring monitor (10s) for clnUMdummy02:1 on cspm04 > ptest[21686]: 2010/03/18_11:22:08 notice: LogActions: Move resource > UmDummy01 (Started cspm01 -> cspm04) > ptest[21686]: 2010/03/18_11:22:08 notice: LogActions: Move resource > UmDummy02 (Started cspm01 -> cspm04) > ptest[21686]: 2010/03/18_11:22:08 notice: LogActions: Move resource > UmDummy03 (Started cspm01 -> cspm04) > ptest[21686]: 2010/03/18_11:22:08 notice: LogActions: Move resource > UmDummy04 (Started cspm01 -> cspm04) > ptest[21686]: 2010/03/18_11:22:08 notice: LogActions: Leave resource > clnUMdummy01:0 (Started unmanaged) > ptest[21686]: 2010/03/18_11:22:08 notice: LogActions: Leave resource > clnUMdummy02:0 (Stopped) > ptest[21686]: 2010/03/18_11:22:08 notice: LogActions: Start > clnUMdummy01:1 (cspm04) > ptest[21686]: 2010/03/18_11:22:08 notice: LogActions: Start > clnUMdummy02:1 (cspm04) > _______________________________________________ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf