On Monday, June 06, 2011 03:11:24 Errol Neal wrote: > On Fri, 06/03/2011 12:31 PM, imnotpc <imno...@rock3d.net> wrote: > > I have a working 3 node cluster with a couple of resources defined. If I > > shutdown a node crm_mon shows the cluster correctly identifies the node, > > marks it as offline, and moves any resources on it. The fencing resource > > (I've tried both ssh and meatware) also sees it as down and marks it > > stopped. So far so good. I was expecting a console warning or a shutdown > > attempt but nothing happens. I checked the logs and can see that stonith > > sees the event but I don't see any actions taken. "crm_verify -L" > > doesn't show any problems. What else should I do to > > troubleshoot/configure this? > > You should probably begin by posting your config so we can have some > additional context. What stonith devices do you have configured?
Right now I have meatware as the stonith device. <?xml version="1.0" ?> <cib admin_epoch="0" cib-last-written="Mon Jun 6 08:45:09 2011" crm_feature_set="3.0.5" dc-uuid="JeffDesk.LAN" epoch="17" have-quorum="1" num_updates="81" validate-with="pacemaker-1.2"> <configuration> <crm_config> <cluster_property_set id="cib-bootstrap-options"> <nvpair id="cib-bootstrap-options-dc-version" name="dc-version" value="1.1.5-1.fc15-01e86afaaa6d4a8c4836f68df80ababd6ca3902f"/> <nvpair id="cib-bootstrap-options-cluster-infrastructure" name="cluster-infrastructure" value="openais"/> <nvpair id="cib-bootstrap-options-expected-quorum-votes" name="expected-quorum-votes" value="3"/> <nvpair id="cib-bootstrap-options-stonith-enabled" name="stonith- enabled" value="true"/> </cluster_property_set> </crm_config> <nodes> <node id="Server4.LAN" type="normal" uname="Server4.LAN"/> <node id="JeffDesk.LAN" type="normal" uname="JeffDesk.LAN"/> <node id="Server2.LAN" type="normal" uname="Server2.LAN"/> </nodes> <resources> <primitive class="ocf" id="ClusterIP" provider="heartbeat" type="IPaddr2"> <instance_attributes id="ClusterIP-instance_attributes"> <nvpair id="ClusterIP-instance_attributes-ip" name="ip" value="192.168.0.200"/> <nvpair id="ClusterIP-instance_attributes-cidr_netmask" name="cidr_netmask" value="32"/> </instance_attributes> <operations> <op id="ClusterIP-monitor-30s" interval="30s" name="monitor"/> </operations> </primitive> <clone id="Fencing"> <primitive class="stonith" id="meatware-fence" type="meatware"> <instance_attributes id="meatware-fence-instance_attributes"> <nvpair id="meatware-fence-instance_attributes-hostlist" name="hostlist" value="JeffDesk.LAN Server2.LAN Server4.LAN"/> </instance_attributes> </primitive> </clone> </resources> <constraints/> </configuration> </cib> When I shutdown a node I see this in the logs: [...] Jun 6 09:53:00 Server2 crmd: [2362]: info: handle_shutdown_request: Creating shutdown request for Server4.LAN (state=S_IDLE) Jun 6 09:53:00 Server2 crmd: [2362]: info: abort_transition_graph: te_update_diff:149 - Triggered transition abort (complete=1, tag=nvpair, id=status-Server4.LAN-shutdown, magic=NA, cib=0.17.208) : Transient attribute: update Jun 6 09:53:00 Server2 crmd: [2362]: info: do_state_transition: State transition S_IDLE -> S_POLICY_ENGINE [ input=I_PE_CALC cause=C_FSA_INTERNAL origin=abort_transition_graph ] Jun 6 09:53:00 Server2 crmd: [2362]: info: do_state_transition: All 3 cluster nodes are eligible to run resources. Jun 6 09:53:00 Server2 crmd: [2362]: info: do_pe_invoke: Query 84: Requesting the current CIB: S_POLICY_ENGINE Jun 6 09:53:00 Server2 pengine: [2361]: notice: native_print: ClusterIP#011(ocf::heartbeat:IPaddr2):#011Started Server2.LAN Jun 6 09:53:00 Server2 crmd: [2362]: info: do_pe_invoke_callback: Invoking the PE: query=84, ref=pe_calc-dc-1307368380-52, seq=252, quorate=1 Jun 6 09:53:00 Server2 pengine: [2361]: notice: clone_print: Clone Set: Fencing [meatware-fence] Jun 6 09:53:00 Server2 pengine: [2361]: notice: short_print: Started: [ Server2.LAN JeffDesk.LAN Server4.LAN ] Jun 6 09:53:00 Server2 pengine: [2361]: notice: stage6: Scheduling Node Server4.LAN for shutdown Jun 6 09:53:00 Server2 pengine: [2361]: notice: LogActions: Leave ClusterIP#011(Started Server2.LAN) Jun 6 09:53:00 Server2 pengine: [2361]: notice: LogActions: Leave meatware- fence:0#011(Started Server2.LAN) Jun 6 09:53:00 Server2 pengine: [2361]: notice: LogActions: Leave meatware- fence:1#011(Started JeffDesk.LAN) Jun 6 09:53:00 Server2 pengine: [2361]: notice: LogActions: Stop meatware- fence:2#011(Server4.LAN) Jun 6 09:53:00 Server2 crmd: [2362]: info: do_state_transition: State transition S_POLICY_ENGINE -> S_TRANSITION_ENGINE [ input=I_PE_SUCCESS cause=C_IPC_MESSAGE origin=handle_response ] Jun 6 09:53:00 Server2 crmd: [2362]: info: unpack_graph: Unpacked transition 4: 4 actions in 4 synapses Jun 6 09:53:00 Server2 crmd: [2362]: info: do_te_invoke: Processing graph 4 (ref=pe_calc-dc-1307368380-52) derived from /var/lib/pengine/pe-input-67.bz2 Jun 6 09:53:00 Server2 crmd: [2362]: info: te_pseudo_action: Pseudo action 16 fired and confirmed Jun 6 09:53:00 Server2 crmd: [2362]: info: te_rsc_command: Initiating action 13: stop meatware-fence:2_stop_0 on Server4.LAN Jun 6 09:53:00 Server2 crmd: [2362]: info: match_graph_event: Action meatware-fence:2_stop_0 (13) confirmed on Server4.LAN (rc=0) Jun 6 09:53:00 Server2 crmd: [2362]: info: te_pseudo_action: Pseudo action 17 fired and confirmed Jun 6 09:53:00 Server2 crmd: [2362]: info: te_crm_command: Executing crm- event (20): do_shutdown on Server4.LAN Jun 6 09:53:00 Server2 crmd: [2362]: info: run_graph: ==================================================== Jun 6 09:53:00 Server2 crmd: [2362]: notice: run_graph: Transition 4 (Complete=4, Pending=0, Fired=0, Skipped=0, Incomplete=0, Source=/var/lib/pengine/pe-input-67.bz2): Complete Jun 6 09:53:00 Server2 crmd: [2362]: info: te_graph_trigger: Transition 4 is now complete Jun 6 09:53:00 Server2 crmd: [2362]: info: notify_crmd: Transition 4 status: done - <null> Jun 6 09:53:00 Server2 crmd: [2362]: info: do_state_transition: State transition S_TRANSITION_ENGINE -> S_IDLE [ input=I_TE_SUCCESS cause=C_FSA_INTERNAL origin=notify_crmd ] Jun 6 09:53:00 Server2 crmd: [2362]: info: do_state_transition: Starting PEngine Recheck Timer Jun 6 09:53:00 Server2 pacemakerd: [2353]: info: update_node_processes: Node Server4.LAN now has process list: 00000000000000000000000000111112 (was 00000000000000000000000000111312) Jun 6 09:53:00 Server2 stonith-ng: [2357]: info: crm_update_peer: Node Server4.LAN: id=0 state=unknown addr=(null) votes=0 born=0 seen=0 proc=00000000000000000000000000111112 (new) Jun 6 09:53:00 Server2 attrd: [2360]: info: crm_update_peer: Node Server4.LAN: id=67152064 state=unknown addr=(null) votes=0 born=0 seen=0 proc=00000000000000000000000000111112 (new) Jun 6 09:53:00 Server2 cib: [2358]: info: crm_update_peer: Node Server4.LAN: id=67152064 state=member addr=r(0) ip(192.168.0.4) votes=1 born=244 seen=252 proc=00000000000000000000000000111112 (new) Jun 6 09:53:00 Server2 crmd: [2362]: notice: crmd_peer_update: Status update: Client Server4.LAN/crmd now has status [offline] (DC=true) Jun 6 09:53:00 Server2 crmd: [2362]: info: erase_node_from_join: Removed node Server4.LAN from join calculations: welcomed=0 itegrated=0 finalized=0 confirmed=1 Jun 6 09:53:00 Server2 crmd: [2362]: info: crm_update_peer: Node Server4.LAN: id=67152064 state=member addr=r(0) ip(192.168.0.4) votes=1 born=244 seen=252 proc=00000000000000000000000000111112 (new) Jun 6 09:53:00 Server2 pacemakerd: [2353]: info: update_node_processes: Node Server4.LAN now has process list: 00000000000000000000000000101112 (was 00000000000000000000000000111112) Jun 6 09:53:00 Server2 stonith-ng: [2357]: info: crm_update_peer: Node Server4.LAN: id=0 state=unknown addr=(null) votes=0 born=0 seen=0 proc=00000000000000000000000000101112 (new) Jun 6 09:53:00 Server2 attrd: [2360]: info: crm_update_peer: Node Server4.LAN: id=67152064 state=unknown addr=(null) votes=0 born=0 seen=0 proc=00000000000000000000000000101112 (new) Jun 6 09:53:00 Server2 crmd: [2362]: info: crm_update_peer: Node Server4.LAN: id=67152064 state=member addr=r(0) ip(192.168.0.4) votes=1 born=244 seen=252 proc=00000000000000000000000000101112 (new) Jun 6 09:53:00 Server2 cib: [2358]: info: crm_update_peer: Node Server4.LAN: id=67152064 state=member addr=r(0) ip(192.168.0.4) votes=1 born=244 seen=252 proc=00000000000000000000000000101112 (new) Jun 6 09:53:01 Server2 pacemakerd: [2353]: info: update_node_processes: Node Server4.LAN now has process list: 00000000000000000000000000100112 (was 00000000000000000000000000101112) Jun 6 09:53:01 Server2 attrd: [2360]: info: crm_update_peer: Node Server4.LAN: id=67152064 state=unknown addr=(null) votes=0 born=0 seen=0 proc=00000000000000000000000000100112 (new) Jun 6 09:53:01 Server2 stonith-ng: [2357]: info: crm_update_peer: Node Server4.LAN: id=0 state=unknown addr=(null) votes=0 born=0 seen=0 proc=00000000000000000000000000100112 (new) Jun 6 09:53:01 Server2 crmd: [2362]: info: crm_update_peer: Node Server4.LAN: id=67152064 state=member addr=r(0) ip(192.168.0.4) votes=1 born=244 seen=252 proc=00000000000000000000000000100112 (new) Jun 6 09:53:01 Server2 cib: [2358]: info: crm_update_peer: Node Server4.LAN: id=67152064 state=member addr=r(0) ip(192.168.0.4) votes=1 born=244 seen=252 proc=00000000000000000000000000100112 (new) Jun 6 09:53:01 Server2 pacemakerd: [2353]: info: update_node_processes: Node Server4.LAN now has process list: 00000000000000000000000000100102 (was 00000000000000000000000000100112) Jun 6 09:53:01 Server2 attrd: [2360]: info: crm_update_peer: Node Server4.LAN: id=67152064 state=unknown addr=(null) votes=0 born=0 seen=0 proc=00000000000000000000000000100102 (new) Jun 6 09:53:01 Server2 stonith-ng: [2357]: info: crm_update_peer: Node Server4.LAN: id=0 state=unknown addr=(null) votes=0 born=0 seen=0 proc=00000000000000000000000000100102 (new) Jun 6 09:53:01 Server2 crmd: [2362]: info: crm_update_peer: Node Server4.LAN: id=67152064 state=member addr=r(0) ip(192.168.0.4) votes=1 born=244 seen=252 proc=00000000000000000000000000100102 (new) Jun 6 09:53:01 Server2 cib: [2358]: info: crm_update_peer: Node Server4.LAN: id=67152064 state=member addr=r(0) ip(192.168.0.4) votes=1 born=244 seen=252 proc=00000000000000000000000000100102 (new) Jun 6 09:53:01 Server2 cib: [2358]: info: cib_process_shutdown_req: Shutdown REQ from Server4.LAN Jun 6 09:53:01 Server2 cib: [2358]: info: cib_process_request: Operation complete: op cib_shutdown_req for section 'all' (origin=Server4.LAN/Server4.LAN/(null), version=0.17.210): ok (rc=0) Jun 6 09:53:06 Server2 pacemakerd: [2353]: info: update_node_processes: Node Server4.LAN now has process list: 00000000000000000000000000100002 (was 00000000000000000000000000100102) Jun 6 09:53:06 Server2 stonith-ng: [2357]: info: crm_update_peer: Node Server4.LAN: id=0 state=unknown addr=(null) votes=0 born=0 seen=0 proc=00000000000000000000000000100002 (new) Jun 6 09:53:06 Server2 attrd: [2360]: info: crm_update_peer: Node Server4.LAN: id=67152064 state=unknown addr=(null) votes=0 born=0 seen=0 proc=00000000000000000000000000100002 (new) Jun 6 09:53:06 Server2 crmd: [2362]: info: crm_update_peer: Node Server4.LAN: id=67152064 state=member addr=r(0) ip(192.168.0.4) votes=1 born=244 seen=252 proc=00000000000000000000000000100002 (new) Jun 6 09:53:06 Server2 cib: [2358]: info: crm_update_peer: Node Server4.LAN: id=67152064 state=member addr=r(0) ip(192.168.0.4) votes=1 born=244 seen=252 proc=00000000000000000000000000100002 (new) Jun 6 09:53:06 Server2 pacemakerd: [2353]: info: update_node_processes: Node Server4.LAN now has process list: 00000000000000000000000000000002 (was 00000000000000000000000000100002) Jun 6 09:53:06 Server2 attrd: [2360]: info: crm_update_peer: Node Server4.LAN: id=67152064 state=unknown addr=(null) votes=0 born=0 seen=0 proc=00000000000000000000000000000002 (new) Jun 6 09:53:06 Server2 stonith-ng: [2357]: info: crm_update_peer: Node Server4.LAN: id=0 state=unknown addr=(null) votes=0 born=0 seen=0 proc=00000000000000000000000000000002 (new) Jun 6 09:53:06 Server2 crmd: [2362]: info: crm_update_peer: Node Server4.LAN: id=67152064 state=member addr=r(0) ip(192.168.0.4) votes=1 born=244 seen=252 proc=00000000000000000000000000000002 (new) Jun 6 09:53:06 Server2 cib: [2358]: info: crm_update_peer: Node Server4.LAN: id=67152064 state=member addr=r(0) ip(192.168.0.4) votes=1 born=244 seen=252 proc=00000000000000000000000000000002 (new) [...] The reference to pseudo actions seems suspicious. Jeff _______________________________________________ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker