I seem to have found a situation where pacemaker (pacemaker-1.1.7-6.el6.x86_64) refuses to stop (i.e. service pacemaker stop) on EL6.
The status of the 2 node cluster was that the node being asked to stop (node2) was continually trying to stonith another node (node1) in the cluster which was not running corosync/pacemaker (yet). The reason node2 was looping around the stonith operation for node1 was that there was no stonith resource set up for node1 (yet). The log on node2 simply repeats this over and over again: stonith-ng[20695]: error: remote_op_done: Operation reboot of node1 by <no-one> for node2[d4e76f3a-42ed-4576-975e-b805ac30c04a]: Operation timed out crmd[20699]: info: tengine_stonith_callback: StonithOp <remote-op state="0" st_target="node1" st_op="reboot" /> crmd[20699]: notice: tengine_stonith_callback: Stonith operation 110 for node1 failed (Operation timed out): aborting transition. crmd[20699]: info: abort_transition_graph: tengine_stonith_callback:454 - Triggered transition abort (complete=0) : Stonith failed crmd[20699]: notice: tengine_stonith_notify: Peer node1 was not terminated (reboot) by <anyone> for node2: Operation timed out (ref=18e93407-4efa-4b97-99e1-b331591598ef) crmd[20699]: notice: run_graph: ==== Transition 108 (Complete=2, Pending=0, Fired=0, Skipped=4, Incomplete=0, Source=/var/lib/pengine/pe-warn-3.bz2): Stopped crmd[20699]: notice: do_state_transition: State transition S_TRANSITION_ENGINE -> S_POLICY_ENGINE [ input=I_PE_CALC cause=C_FSA_INTERNAL origin=notify_crmd ] pengine[20698]: notice: unpack_config: On loss of CCM Quorum: Ignore pengine[20698]: warning: stage6: Scheduling Node node1 for STONITH pengine[20698]: notice: stage6: Scheduling Node node2 for shutdown pengine[20698]: notice: LogActions: Stop st-fencing#011(node2) pengine[20698]: warning: process_pe_message: Transition 109: WARNINGs found during PE processing. PEngine Input stored in: /var/lib/pengine/pe-warn-3.bz2 crmd[20699]: notice: do_state_transition: State transition S_POLICY_ENGINE -> S_TRANSITION_ENGINE [ input=I_PE_SUCCESS cause=C_IPC_MESSAGE origin=handle_response ] pengine[20698]: notice: process_pe_message: Configuration WARNINGs found during PE processing. Please run "crm_verify -L" to identify issues. crmd[20699]: info: do_te_invoke: Processing graph 109 (ref=pe_calc-dc-1361624958-120) derived from /var/lib/pengine/pe-warn-3.bz2 crmd[20699]: notice: te_fence_node: Executing reboot fencing operation (7) on node1 (timeout=60000) stonith-ng[20695]: info: initiate_remote_stonith_op: Initiating remote operation reboot for node1: 96b06897-5ba7-46c3-b9d2-797113df2812 stonith-ng[20695]: info: can_fence_host_with_device: Refreshing port list for st-fencing stonith-ng[20695]: info: can_fence_host_with_device: st-fencing can not fence node1: dynamic-list stonith-ng[20695]: info: stonith_command: Processed st_query from node2: rc=0 and while that's repeating the "service pacemaker stop" is producing: node2# service pacemaker stop Signaling Pacemaker Cluster Manager to terminate: [ OK ] Waiting for cluster services to unload:................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................. I suppose this will continue forever until I either manually force pacemaker down or fix up the cluster config to allow the stonith operation to succeed. In an environment where pacemaker is being controlled by another process, this is clearly an undesirable sit- uation. Is this behavior (the shutdown hanging while pacemaker spins trying to stonith) expected? Cheers, b.
signature.asc
Description: OpenPGP digital signature
_______________________________________________ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org