Hello, On 01/19/2012 01:39 PM, agutxi Agustin wrote: > Hi all, > I am trying to set up a cluster of virtual machine hosts, and while > doing so, I came out with a very strange behaviour (I think it may be > a bug) and I hope you can lend me a hand in debugging this. > For testing the behaviour observed in my production environment, I set > up 2 new simple machines with no location/colocation/order > constraints, and changed the Xen resource agent with the dummy > resource agent, and the behaviour was the same.
Have you tried Pacemaker 1.1.6? There have been some utilization fixes. Regards, Andreas -- Need help with Pacemaker? http://www.hastexo.com/now > > The scenario is the following: > - The strategy is "utilization". > - 2 nodes: vmHost1 and vmHost2, with 2 cores each handle 5 resources: > DummyVM001-005, with resource-stickiness="INFINITY". > > What happens is the following: > - If I start resources DummyVM001-004, everything is fine. 2 resources > run on each of the nodes > - Now, I start DummyVM005, but utilization is full, so it does not > start (cool :) > - Then , if I stop any of the running resources, everything goes > smoothly and DummyVM005 starts up. That's cool too. > * Here comes the strange part: > I have full utilization and resource stickines INFINITY, so starting > new resources shouldn't change anything in the cluster status BUT > If any of the freshly restarted resources is alphabetically sorted > before, the cluster stops the "last" resource alphabetically sorted > running and starts the stopped one. > I don't think this is the expected behaviour, please correct me if I wrong. > > Thank you kindly, > Agustin > > status: > ==== > Online: [ vmHost1 vmHost2 ] > > DummyVM1 (ocf::pacemaker:Dummy): Started vmHost1 > DummyVM2 (ocf::pacemaker:Dummy): Started vmHost1 > DummyVM3 (ocf::pacemaker:Dummy): Started vmHost2 > DummyVM5 (ocf::pacemaker:Dummy): Started vmHost2 > crm(live)# resource start DummyVM4 > > My configuration: > ============ > crm(live)# configure show > node vmHost1 \ > utilization cores="2" > node vmHost2 \ > utilization cores="2" > primitive DummyVM1 ocf:pacemaker:Dummy \ > op monitor interval="60s" timeout="60s" \ > op start on-fail="restart" interval="0" \ > op stop on-fail="ignore" interval="0" \ > utilization cores="1" \ > meta is-managed="true" migration-threshold="2" > primitive DummyVM2 ocf:pacemaker:Dummy \ > op monitor interval="60s" timeout="60s" \ > op start on-fail="restart" interval="0" \ > op stop on-fail="ignore" interval="0" \ > utilization cores="1" \ > meta is-managed="true" migration-threshold="2" > primitive DummyVM3 ocf:pacemaker:Dummy \ > op monitor interval="60s" timeout="60s" \ > op start on-fail="restart" interval="0" \ > op stop on-fail="ignore" interval="0" \ > utilization cores="1" \ > meta is-managed="true" migration-threshold="2" > primitive DummyVM4 ocf:pacemaker:Dummy \ > op monitor interval="60s" timeout="60s" \ > op start on-fail="restart" interval="0" \ > op stop on-fail="ignore" interval="0" \ > utilization cores="1" \ > meta is-managed="true" migration-threshold="2" target-role="Started" > primitive DummyVM5 ocf:pacemaker:Dummy \ > op monitor interval="60s" timeout="60s" \ > op start on-fail="restart" interval="0" \ > op stop on-fail="ignore" interval="0" \ > utilization cores="1" \ > meta is-managed="true" migration-threshold="2" target-role="Started" > property $id="cib-bootstrap-options" \ > dc-version="1.1.5-01e86afaaa6d4a8c4836f68df80ababd6ca3902f" \ > cluster-infrastructure="openais" \ > expected-quorum-votes="2" \ > stonith-enabled="false" \ > stop-all-resources="false" \ > placement-strategy="utilization" \ > no-quorum-policy="ignore" \ > cluster-infrastructure="openais" \ > stop-orphan-resources="true" \ > stop-orphan-actions="true" \ > last-lrm-refresh="1326975274" > rsc_defaults $id="rsc-options" \ > resource-stickiness="INFINITY" > > > and the important part in /var/log/syslog : > Jan 19 13:36:19 vmHost1 cib: [725]: info: cib:diff- <cib > admin_epoch="0" epoch="75" num_updates="4" > > Jan 19 13:36:19 vmHost1 cib: [725]: info: cib:diff- <configuration > > Jan 19 13:36:19 vmHost1 cib: [725]: info: cib:diff- <resources > > Jan 19 13:36:19 vmHost1 cib: [725]: info: cib:diff- <primitive > id="DummyVM4" > > Jan 19 13:36:19 vmHost1 cib: [725]: info: cib:diff- > <meta_attributes id="DummyVM4-meta_attributes" > > Jan 19 13:36:19 vmHost1 cib: [725]: info: cib:diff- <nvpair > value="Stopped" id="DummyVM4-meta_attributes-target-role" /> > Jan 19 13:36:19 vmHost1 cib: [725]: info: cib:diff- </meta_attributes> > Jan 19 13:36:19 vmHost1 cib: [725]: info: cib:diff- </primitive> > Jan 19 13:36:19 vmHost1 cib: [725]: info: cib:diff- </resources> > Jan 19 13:36:19 vmHost1 cib: [725]: info: cib:diff- </configuration> > Jan 19 13:36:19 vmHost1 cib: [725]: info: cib:diff- </cib> > Jan 19 13:36:19 vmHost1 cib: [725]: info: cib:diff+ <cib epoch="76" > num_updates="1" admin_epoch="0" validate-with="pacemaker-1.2" > crm_feature_set="3.0.5" have-quorum="1" cib-last-written="Thu Jan 19 > 12:35:47 2012" dc-uuid="vmHost1" > > Jan 19 13:36:19 vmHost1 cib: [725]: info: cib:diff+ <configuration > > Jan 19 13:36:19 vmHost1 cib: [725]: info: cib:diff+ <resources > > Jan 19 13:36:19 vmHost1 cib: [725]: info: cib:diff+ <primitive > class="ocf" id="DummyVM4" provider="pacemaker" type="Dummy" > > Jan 19 13:36:19 vmHost1 cib: [725]: info: cib:diff+ > <meta_attributes id="DummyVM4-meta_attributes" > > Jan 19 13:36:19 vmHost1 cib: [725]: info: cib:diff+ <nvpair > id="DummyVM4-meta_attributes-target-role" name="target-role" > value="Started" /> > Jan 19 13:36:19 vmHost1 cib: [725]: info: cib:diff+ </meta_attributes> > Jan 19 13:36:19 vmHost1 cib: [725]: info: cib:diff+ </primitive> > Jan 19 13:36:19 vmHost1 cib: [725]: info: cib:diff+ </resources> > Jan 19 13:36:19 vmHost1 cib: [725]: info: cib:diff+ </configuration> > Jan 19 13:36:19 vmHost1 cib: [725]: info: cib:diff+ </cib> > Jan 19 13:36:19 vmHost1 crmd: [729]: info: abort_transition_graph: > te_update_diff:131 - Triggered transition abort (complete=1, tag=diff, > id=(null), magic=NA, cib=0.76.1) : Non-status change > Jan 19 13:36:19 vmHost1 crmd: [729]: info: do_state_transition: State > transition S_IDLE -> S_POLICY_ENGINE [ input=I_PE_CALC > cause=C_FSA_INTERNAL origin=abort_transition_graph ] > Jan 19 13:36:19 vmHost1 crmd: [729]: info: do_state_transition: All 2 > cluster nodes are eligible to run resources. > Jan 19 13:36:19 vmHost1 crmd: [729]: info: do_pe_invoke: Query 191: > Requesting the current CIB: S_POLICY_ENGINE > Jan 19 13:36:19 vmHost1 cib: [725]: info: cib_process_request: > Operation complete: op cib_replace for section resources > (origin=local/cibadmin/2, version=0.76.1): ok (rc=0) > Jan 19 13:36:19 vmHost1 crmd: [729]: info: do_pe_invoke_callback: > Invoking the PE: query=191, ref=pe_calc-dc-1326976579-119, seq=64, > quorate=1 > Jan 19 13:36:19 vmHost1 pengine: [728]: notice: unpack_config: On loss > of CCM Quorum: Ignore > Jan 19 13:36:19 vmHost1 pengine: [728]: notice: native_print: > DummyVM1#011(ocf::pacemaker:Dummy):#011Started vmHost1 > Jan 19 13:36:19 vmHost1 pengine: [728]: notice: native_print: > DummyVM2#011(ocf::pacemaker:Dummy):#011Started vmHost1 > Jan 19 13:36:19 vmHost1 pengine: [728]: notice: native_print: > DummyVM3#011(ocf::pacemaker:Dummy):#011Started vmHost2 > Jan 19 13:36:19 vmHost1 pengine: [728]: notice: native_print: > DummyVM4#011(ocf::pacemaker:Dummy):#011Stopped > Jan 19 13:36:19 vmHost1 pengine: [728]: notice: native_print: > DummyVM5#011(ocf::pacemaker:Dummy):#011Started vmHost2 > Jan 19 13:36:19 vmHost1 pengine: [728]: notice: RecurringOp: Start > recurring monitor (60s) for DummyVM4 on vmHost2 > Jan 19 13:36:19 vmHost1 pengine: [728]: notice: LogActions: Leave > DummyVM1#011(Started vmHost1) > Jan 19 13:36:19 vmHost1 pengine: [728]: notice: LogActions: Leave > DummyVM2#011(Started vmHost1) > Jan 19 13:36:19 vmHost1 pengine: [728]: notice: LogActions: Leave > DummyVM3#011(Started vmHost2) > Jan 19 13:36:19 vmHost1 pengine: [728]: notice: LogActions: Start > DummyVM4#011(vmHost2) > Jan 19 13:36:19 vmHost1 pengine: [728]: notice: LogActions: Stop > DummyVM5#011(vmHost2) > Jan 19 13:36:19 vmHost1 crmd: [729]: info: do_state_transition: State > transition S_POLICY_ENGINE -> S_TRANSITION_ENGINE [ input=I_PE_SUCCESS > cause=C_IPC_MESSAGE origin=handle_response ] > Jan 19 13:36:19 vmHost1 crmd: [729]: info: unpack_graph: Unpacked > transition 33: 6 actions in 6 synapses > Jan 19 13:36:19 vmHost1 crmd: [729]: info: do_te_invoke: Processing > graph 33 (ref=pe_calc-dc-1326976579-119) derived from > /var/lib/pengine/pe-input-717.bz2 > Jan 19 13:36:19 vmHost1 crmd: [729]: info: te_rsc_command: Initiating > action 19: stop DummyVM5_stop_0 on vmHost2 > Jan 19 13:36:19 vmHost1 crmd: [729]: info: te_pseudo_action: Pseudo > action 6 fired and confirmed > Jan 19 13:36:19 vmHost1 crmd: [729]: info: match_graph_event: Action > DummyVM5_stop_0 (19) confirmed on vmHost2 (rc=0) > Jan 19 13:36:19 vmHost1 crmd: [729]: info: te_pseudo_action: Pseudo > action 7 fired and confirmed > Jan 19 13:36:19 vmHost1 crmd: [729]: info: te_pseudo_action: Pseudo > action 5 fired and confirmed > Jan 19 13:36:19 vmHost1 crmd: [729]: info: te_rsc_command: Initiating > action 17: start DummyVM4_start_0 on vmHost2 > Jan 19 13:36:19 vmHost1 crmd: [729]: info: match_graph_event: Action > DummyVM4_start_0 (17) confirmed on vmHost2 (rc=0) > Jan 19 13:36:19 vmHost1 crmd: [729]: info: te_rsc_command: Initiating > action 18: monitor DummyVM4_monitor_60000 on vmHost2 > Jan 19 13:36:19 vmHost1 crmd: [729]: info: match_graph_event: Action > DummyVM4_monitor_60000 (18) confirmed on vmHost2 (rc=0) > Jan 19 13:36:19 vmHost1 crmd: [729]: info: run_graph: > ==================================================== > Jan 19 13:36:19 vmHost1 crmd: [729]: notice: run_graph: Transition 33 > (Complete=6, Pending=0, Fired=0, Skipped=0, Incomplete=0, > Source=/var/lib/pengine/pe-input-717.bz2): Complete > Jan 19 13:36:19 vmHost1 crmd: [729]: info: te_graph_trigger: > Transition 33 is now complete > Jan 19 13:36:19 vmHost1 crmd: [729]: info: notify_crmd: Transition 33 > status: done - <null> > Jan 19 13:36:19 vmHost1 crmd: [729]: info: do_state_transition: State > transition S_TRANSITION_ENGINE -> S_IDLE [ input=I_TE_SUCCESS > cause=C_FSA_INTERNAL origin=notify_crmd ] > Jan 19 13:36:19 vmHost1 crmd: [729]: info: do_state_transition: > Starting PEngine Recheck Timer > > > >
signature.asc
Description: OpenPGP digital signature
_______________________________________________ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org