On Tue, Nov 9, 2010 at 2:14 PM, Vadim S. Khondar <v.khon...@o3.ua> wrote: > У вт, 2010-11-09 у 09:49 +0100, Andrew Beekhof пише: >> being unmanaged is a side-effect of a) the resource failing to stop >> and b) no fencing being configured >> once you've fixed the error, run crm resource cleanup as misch suggested >> > > I understand that. > However, for example, in situation when VPS fails to start (not to stop)
Its failing to stop too: ca_stop_0 (node=ha-3, call=49, rc=1, status=complete): unknown error ^^^^^^^^ > because of lack of configuration file and due to this becomes unmanaged, > I run: > > crm(live)# status > ============ > Last updated: Tue Nov 9 14:53:09 2010 > Stack: Heartbeat > Current DC: ha-3 (a1ad8f56-7eb0-4aec-8d32-83e283903879) - partition with > quorum > Version: 1.0.9-89bd754939df5150de7cd76835f98fe90851b677 > 2 Nodes configured, unknown expected votes > 2 Resources configured. > ============ > > Online: [ ha-3 ha-4 ] > > test_ManageVE (ocf::heartbeat:ManageVE): Started ha-3 > ca (ocf::heartbeat:ManageVE): Started ha-3 (unmanaged) FAILED > > Failed actions: > ca_start_0 (node=ha-3, call=48, rc=5, status=complete): not > installed > ca_stop_0 (node=ha-3, call=49, rc=1, status=complete): unknown error > > After fixing the issue (and checking that VPS really can be started via > shell): > > crm(live)# resource cleanup ca > Cleaning up ca on ha-3 > Cleaning up ca on ha-4 > > > Got the following in /var/log/messages on current DC ha-3: > > Nov 9 14:58:19 ha-3 crmd: [8434]: notice: do_lrm_invoke: Not creating > resource for a delete event: (null) > Nov 9 14:58:19 ha-3 crmd: [8434]: info: send_direct_ack: ACK'ing > resource op ca_delete_60000 from 0:0:crm-resource-17296: > lrm_invoke-lrmd-1289307499-777 > Nov 9 14:58:20 ha-3 attrd: [8433]: info: attrd_ha_callback: Update > relayed from ha-4 > Nov 9 14:58:25 ha-3 lrmd: [8431]: info: Resource Agent output: [] > Nov 9 14:58:25 ha-3 lrmd: [8431]: notice: read's ret: 0 when lrmd_op > finished > > crm(live)# resource manage ca > Log: > Nov 9 15:00:48 ha-3 cib: [8430]: info: cib_process_request: Operation > complete: op cib_replace for section resources (origin=ha-4/cibadmin/2, > version=0.92.2): ok (rc=0) > > And after this still: > Online: [ ha-3 ha-4 ] > > test_ManageVE (ocf::heartbeat:ManageVE): Started ha-3 > ca (ocf::heartbeat:ManageVE): Started ha-3 (unmanaged) FAILED > > Failed actions: > ca_start_0 (node=ha-3, call=48, rc=5, status=complete): not > installed > ca_stop_0 (node=ha-3, call=49, rc=1, status=complete): unknown error > > > If after this I edit CIB and apply it, all LRM messages disappear and > resource starts managed as it should. > Seems like cleanup does not clean all the status information. > > What am I missing? Possibly an ordering constraint. Otherwise, no idea. Depends on how your resource agent works. _______________________________________________ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker