Re: [Pacemaker] newb - stonith not working - require others to stonith node

Brett Lee Fri, 29 Jun 2012 12:45:22 -0700

Hello -Am thinking that this is progress.

Have made some updates, but still getting the same result ("require others to 
stonith node st15-mds1").



Referencing this link for the updates made:
http://www.hastexo.com/resources/hints-and-kinks/fencing-libvirtkvm-virtualized-cluster-nodes

Updates include removing the previous 'primitive st-nodes' entry and adding the 
following:


primitive stonith_st15-mds1 stonith:external/libvirt \
        params hostlist="st15-mds1" hypervisor_uri="qemu+ssh://wc0008/system" 
stonith-timeout="30" \
        op start interval="0" timeout="60" \
        op stop interval="0" timeout="60" \
        op monitor interval="60"
primitive stonith_st15-mds2 stonith:external/libvirt \
        params hostlist="st15-mds2" hypervisor_uri="qemu+ssh://wc0008/system" 
stonith-timeout="30" \
        op start interval="0" timeout="60" \
        op stop interval="0" timeout="60" \
        op monitor interval="60"
location l_stonith_st15-mds1 stonith_st15-mds1 -inf: st15-mds1
location l_stonith_st15-mds2 stonith_st15-mds2 -inf: st15-mds2

Any suggestions would certainly be appreciated.  Thanks!

 
Brett Lee
Everything Penguin - http://etpenguin.com




>________________________________
> From: Brett Lee <brett...@yahoo.com>
>To: "pacemaker@oss.clusterlabs.org" <pacemaker@oss.clusterlabs.org> 
>Sent: Friday, June 29, 2012 9:43 AM
>Subject: [Pacemaker] newb - stonith not working - require others to stonith 
>node
> 
>
>Hi - 
>
>
>
>Am new to pacemaker and now have a shiny new configuration that will not 
>stonith.  This is a test system using KVM and external/libvirt - all VMs are 
>running CentOS 5.
>
>Am (really) hoping someone might be willing to help troubleshoot this 
>configuration.  Thank you for your
 time and effort!
>
>
>
>The items that are suspect to me are:
>1.  st-nodes has no 'location' entry
>2.  logs report node_list=
>3.  resource st-nodes is Stopped
>
>Have attached a clip of the configuration below.  The full configuration and 
>log file may be found at - http://pastebin.com/bS87FXUr
>
>
>Per 'stonith -t external/libvirt -h' I have configured stonith using:
>
>
>primitive st-nodes stonith:external/libvirt \
>        params hostlist="st15-mds1,st15-mds2,st15-oss1,st15-oss2" 
>hypervisor_uri="qemu+ssh://wc0008/system" stonith-timeout="30" \
>        op start interval="0" timeout="60"
 \
>        op stop interval="0" timeout="60" \
>        op monitor interval="60"
>
>And a section of the log file is:
>
>
>Jun 29 11:02:07 st15-mds2 stonithd: [4485]: ERROR: Failed to STONITH the node 
>st15-mds1: optype=RESET, op_result=TIMEOUT
>Jun 29 11:02:07 st15-mds2 crmd: [4490]: info: tengine_stonith_callback: 
>call=-65, optype=1, node_name=st15-mds1, result=2, node_list=, 
>action=23:90:0:aac961e7-b06b-4dfd-ae60-c882407b16b5
>Jun 29 11:02:07 st15-mds2 crmd: [4490]: ERROR: tengine_stonith_callback: 
>Stonith of st15-mds1 failed
 (2)... aborting transition.
>Jun 29 11:02:07 st15-mds2 crmd: [4490]: info: abort_transition_graph: 
>tengine_stonith_callback:409 - Triggered transition abort (complete=0) : 
>Stonith failed
>Jun 29 11:02:07 st15-mds2 crmd: [4490]: info: update_abort_priority: Abort 
>priority upgraded from 0 to 1000000
>Jun 29 11:02:07 st15-mds2 crmd: [4490]: info: update_abort_priority: Abort
 action done superceeded by restart
>Jun 29 11:02:07 st15-mds2 crmd: [4490]: info: run_graph: 
>====================================================
>Jun 29 11:02:07 st15-mds2 crmd: [4490]: notice: run_graph: Transition 90 
>(Complete=2, Pending=0, Fired=0, Skipped=5, Incomplete=0, 
>Source=/var/lib/pengine/pe-warn-173.bz2): Stopped
>Jun 29 11:02:07 st15-mds2 crmd: [4490]: info: te_graph_trigger: Transition 90 
>is now complete
>Jun 29 11:02:07 st15-mds2 crmd: [4490]: info: do_state_transition: State 
>transition S_TRANSITION_ENGINE -> S_POLICY_ENGINE [ input=I_PE_CALC 
>cause=C_FSA_INTERNAL origin=notify_crmd ]
>Jun 29 11:02:07 st15-mds2 crmd: [4490]: info: do_state_transition: All 3 
>cluster nodes are eligible to run resources.
>Jun 29 11:02:07 st15-mds2 crmd: [4490]: info: do_pe_invoke: Query 299: 
>Requesting the current CIB: S_POLICY_ENGINE
>Jun 29 11:02:07 st15-mds2 crmd: [4490]: info: do_pe_invoke_callback: Invoking 
>the PE: query=299,
 ref=pe_calc-dc-1340982127-223, seq=396, quorate=1
>Jun 29 11:02:07 st15-mds2 pengine: [4489]: info: unpack_config: Node scores: 
>'red' = -INFINITY, 'yellow' = 0, 'green' = 0
>Jun 29 11:02:07 st15-mds2 pengine: [4489]: info: determine_online_status: Node 
>st15-mds2 is online
>Jun 29 11:02:07 st15-mds2 pengine: [4489]: WARN: pe_fence_node: Node st15-mds1 
>will be fenced because it is un-expectedly down
>Jun 29 11:02:07 st15-mds2 pengine: [4489]: info: 
>determine_online_status_fencing:     ha_state=active, ccm_state=false, 
>crm_state=online, join_state=member, expected=member
>Jun 29 11:02:07 st15-mds2 pengine: [4489]: WARN: determine_online_status: Node 
>st15-mds1 is unclean
>Jun 29 11:02:07 st15-mds2 pengine: [4489]: info: determine_online_status: Node 
>st15-oss1 is online
>Jun 29 11:02:07 st15-mds2 pengine: [4489]: info: determine_online_status: Node 
>st15-oss2 is online
>Jun 29 11:02:07 st15-mds2 pengine: [4489]: notice:
 native_print: lustre-OST0000    (ocf::heartbeat:Filesystem):    Started 
st15-oss1
>Jun 29 11:02:07 st15-mds2 pengine: [4489]: notice: native_print: 
>lustre-OST0001    (ocf::heartbeat:Filesystem):    Started st15-oss1
>Jun 29 11:02:07 st15-mds2 pengine: [4489]: notice: native_print: 
>lustre-OST0002    (ocf::heartbeat:Filesystem):    Started st15-oss2
>Jun 29 11:02:07 st15-mds2 pengine: [4489]: notice: native_print: 
>lustre-OST0003    (ocf::heartbeat:Filesystem):    Started st15-oss2
>Jun 29 11:02:07 st15-mds2 pengine: [4489]: notice: native_print: 
>lustre-MDT0000    (ocf::heartbeat:Filesystem):    Started st15-mds1
>Jun 29 11:02:07 st15-mds2 pengine: [4489]: notice: native_print: st-nodes    
>(stonith:external/libvirt):    Stopped 
>Jun 29 11:02:07 st15-mds2 pengine:
 [4489]: info: native_color: Resource st-nodes cannot run anywhere
>Jun 29 11:02:07 st15-mds2 pengine: [4489]: WARN: custom_action: Action 
>lustre-MDT0000_stop_0 on st15-mds1 is unrunnable (offline)
>Jun 29 11:02:07 st15-mds2 pengine: [4489]: WARN: custom_action: Marking node 
>st15-mds1 unclean
>Jun 29 11:02:07 st15-mds2 pengine: [4489]: notice: RecurringOp:  Start 
>recurring monitor (120s) for lustre-MDT0000 on st15-mds2
>Jun 29 11:02:07 st15-mds2 pengine: [4489]: WARN: stage6: Scheduling Node 
>st15-mds1 for STONITH
>Jun 29 11:02:07 st15-mds2 pengine: [4489]: info: native_stop_constraints: 
>lustre-MDT0000_stop_0 is implicit after st15-mds1 is fenced
>Jun 29 11:02:07 st15-mds2 pengine: [4489]: notice: LogActions: Leave   
>resource lustre-OST0000    (Started st15-oss1)
>Jun 29 11:02:07 st15-mds2 pengine: [4489]: notice: LogActions: Leave   
>resource lustre-OST0001    (Started st15-oss1)
>Jun
 29 11:02:07 st15-mds2 pengine: [4489]: notice: LogActions: Leave   resource 
lustre-OST0002    (Started st15-oss2)
>Jun 29 11:02:07 st15-mds2 pengine: [4489]: notice: LogActions: Leave   
>resource lustre-OST0003    (Started st15-oss2)
>Jun 29 11:02:07 st15-mds2 pengine: [4489]: notice: LogActions: Move    
>resource lustre-MDT0000    (Started st15-mds1 -> st15-mds2)
>Jun 29 11:02:07 st15-mds2 pengine: [4489]: notice: LogActions: Leave   
>resource st-nodes    (Stopped)
>Jun 29 11:02:07 st15-mds2 crmd: [4490]: info: do_state_transition: State 
>transition S_POLICY_ENGINE -> S_TRANSITION_ENGINE [ input=I_PE_SUCCESS 
>cause=C_IPC_MESSAGE origin=handle_response ]
>Jun 29 11:02:07 st15-mds2 pengine: [4489]: WARN: process_pe_message: 
>Transition 91: WARNINGs found during PE processing. PEngine Input stored in: 
>/var/lib/pengine/pe-warn-174.bz2
>Jun
 29 11:02:07 st15-mds2 crmd: [4490]: info: unpack_graph: Unpacked transition 
91: 7 actions in 7 synapses
>Jun 29 11:02:07 st15-mds2 pengine: [4489]: info: process_pe_message: 
>Configuration WARNINGs found during PE processing.  Please run "crm_verify -L" 
>to identify issues.
>Jun 29 11:02:07 st15-mds2 crmd: [4490]: info: do_te_invoke: Processing graph 
>91 (ref=pe_calc-dc-1340982127-223) derived from 
>/var/lib/pengine/pe-warn-174.bz2
>Jun 29 11:02:07 st15-mds2 crmd: [4490]: info: te_pseudo_action: Pseudo action 
>21 fired and confirmed
>Jun 29 11:02:07 st15-mds2 crmd: [4490]: info: te_fence_node: Executing reboot 
>fencing operation (23) on st15-mds1 (timeout=60000)
>Jun 29 11:02:07 st15-mds2 stonithd: [4485]: info: client tengine [pid: 4490] 
>requests a STONITH operation RESET on node st15-mds1
>Jun 29 11:02:07 st15-mds2 stonithd: [4485]: info: we can't manage st15-mds1, 
>broadcast request to other nodes
>Jun 29 11:02:07 st15-mds2 stonithd:
 [4485]: info: Broadcasting the message succeeded: require others to stonith 
node st15-mds1.
>
>Thank you!
>
> 
>Brett Lee
>Everything Penguin - http://etpenguin.com
>
>_______________________________________________
>Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
>http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>
>Project Home: http://www.clusterlabs.org
>Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>Bugs: http://bugs.clusterlabs.org
>
>
>

_______________________________________________
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Re: [Pacemaker] newb - stonith not working - require others to stonith node

Reply via email to