[Pacemaker] a situation where pacemaker refuses to stop

Brian J. Murrell Sat, 23 Feb 2013 05:20:42 -0800

I seem to have found a situation where pacemaker (pacemaker-1.1.7-6.el6.x86_64)
refuses to stop (i.e. service pacemaker stop) on EL6.


The status of the 2 node cluster was that the node being asked to stop
(node2) was continually trying to stonith another node (node1) in the
cluster which was not running corosync/pacemaker (yet).  The reason
node2 was looping around the stonith operation for node1 was that there
was no stonith resource set up for node1 (yet).

The log on node2 simply repeats this over and over again:

stonith-ng[20695]:    error: remote_op_done: Operation reboot of node1 by 
<no-one> for node2[d4e76f3a-42ed-4576-975e-b805ac30c04a]: Operation timed out
crmd[20699]:     info: tengine_stonith_callback: StonithOp <remote-op state="0" 
st_target="node1" st_op="reboot" />
crmd[20699]:   notice: tengine_stonith_callback: Stonith operation 110 for 
node1 failed (Operation timed out): aborting transition.
crmd[20699]:     info: abort_transition_graph: tengine_stonith_callback:454 - 
Triggered transition abort (complete=0) : Stonith failed
crmd[20699]:   notice: tengine_stonith_notify: Peer node1 was not terminated 
(reboot) by <anyone> for node2: Operation timed out 
(ref=18e93407-4efa-4b97-99e1-b331591598ef)
crmd[20699]:   notice: run_graph: ==== Transition 108 (Complete=2, Pending=0, 
Fired=0, Skipped=4, Incomplete=0, Source=/var/lib/pengine/pe-warn-3.bz2): 
Stopped
crmd[20699]:   notice: do_state_transition: State transition 
S_TRANSITION_ENGINE -> S_POLICY_ENGINE [ input=I_PE_CALC cause=C_FSA_INTERNAL 
origin=notify_crmd ]
pengine[20698]:   notice: unpack_config: On loss of CCM Quorum: Ignore
pengine[20698]:  warning: stage6: Scheduling Node node1 for STONITH
pengine[20698]:   notice: stage6: Scheduling Node node2 for shutdown
pengine[20698]:   notice: LogActions: Stop    st-fencing#011(node2)
pengine[20698]:  warning: process_pe_message: Transition 109: WARNINGs found 
during PE processing. PEngine Input stored in: /var/lib/pengine/pe-warn-3.bz2
crmd[20699]:   notice: do_state_transition: State transition S_POLICY_ENGINE -> 
S_TRANSITION_ENGINE [ input=I_PE_SUCCESS cause=C_IPC_MESSAGE 
origin=handle_response ]
pengine[20698]:   notice: process_pe_message: Configuration WARNINGs found 
during PE processing.  Please run "crm_verify -L" to identify issues.
crmd[20699]:     info: do_te_invoke: Processing graph 109 
(ref=pe_calc-dc-1361624958-120) derived from /var/lib/pengine/pe-warn-3.bz2
crmd[20699]:   notice: te_fence_node: Executing reboot fencing operation (7) on 
node1 (timeout=60000)
stonith-ng[20695]:     info: initiate_remote_stonith_op: Initiating remote 
operation reboot for node1: 96b06897-5ba7-46c3-b9d2-797113df2812
stonith-ng[20695]:     info: can_fence_host_with_device: Refreshing port list 
for st-fencing
stonith-ng[20695]:     info: can_fence_host_with_device: st-fencing can not 
fence node1: dynamic-list
stonith-ng[20695]:     info: stonith_command: Processed st_query from node2: 
rc=0

and while that's repeating the "service pacemaker stop" is producing:

node2# service pacemaker stop
Signaling Pacemaker Cluster Manager to terminate:          [  OK  ]
Waiting for cluster services to 
unload:.................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................

I suppose this will continue forever until I either manually force
pacemaker down or fix up the cluster config to allow the stonith
operation to succeed.  In an environment where pacemaker is being
controlled by another process, this is clearly an undesirable sit-
uation.

Is this behavior (the shutdown hanging while pacemaker spins trying
to stonith) expected?

Cheers,
b.

signature.asc
Description: OpenPGP digital signature

_______________________________________________
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

[Pacemaker] a situation where pacemaker refuses to stop

Reply via email to