[Pacemaker] Problem with stonith in rhel7 + pacemaker 1.1.10 + fence_virsh

Digimer Sat, 21 Dec 2013 12:45:37 -0800

Hi all,

I'm trying to learn pacemaker (still) using a pair of RHEL 7 betaVMs. I've got stonith configured and it technically works (crashed nodereboots), but pacemaker hangs...


Here is the config:

====
Cluster Name: rhel7-pcmk
Corosync Nodes:
 rhel7-01.alteeve.ca rhel7-02.alteeve.ca
Pacemaker Nodes:
 rhel7-01.alteeve.ca rhel7-02.alteeve.ca

Resources:

Stonith Devices:
 Resource: fence_n01_virsh (class=stonith type=fence_virsh)

Attributes: pcmk_host_list=rhel7-01 ipaddr=lemass action=rebootlogin=root passwd_script=/root/lemass.pw delay=15 port=rhel7_01

  Operations: monitor interval=60s (fence_n01_virsh-monitor-interval-60s)
 Resource: fence_n02_virsh (class=stonith type=fence_virsh)

Attributes: pcmk_host_list=rhel7-02 ipaddr=lemass action=rebootlogin=root passwd_script=/root/lemass.pw port=rhel7_02

  Operations: monitor interval=60s (fence_n02_virsh-monitor-interval-60s)
Fencing Levels:

Location Constraints:
Ordering Constraints:
Colocation Constraints:

Cluster Properties:
 cluster-infrastructure: corosync
 dc-version: 1.1.10-19.el7-368c726
 no-quorum-policy: ignore
 stonith-enabled: true
====

Here are the logs:

====

Dec 21 14:36:07 rhel7-01 corosync[1709]: [TOTEM ] A processor failed,forming new configuration.Dec 21 14:36:09 rhel7-01 corosync[1709]: [TOTEM ] A new membership(192.168.122.101:24) was formed. Members left: 2

Dec 21 14:36:09 rhel7-01 corosync[1709]: [QUORUM] Members[1]: 1

Dec 21 14:36:09 rhel7-01 corosync[1709]: [MAIN ] Completed servicesynchronization, ready to provide service.Dec 21 14:36:09 rhel7-01 crmd[1730]: notice: crm_update_peer_state:pcmk_quorum_notification: Node rhel7-02.alteeve.ca[2] - state is nowlost (was member)Dec 21 14:36:09 rhel7-01 crmd[1730]: warning: reap_dead_nodes: Our DCnode (rhel7-02.alteeve.ca) left the clusterDec 21 14:36:09 rhel7-01 crmd[1730]: notice: do_state_transition: Statetransition S_NOT_DC -> S_ELECTION [ input=I_ELECTIONcause=C_FSA_INTERNAL origin=reap_dead_nodes ]Dec 21 14:36:09 rhel7-01 pacemakerd[1724]: notice:crm_update_peer_state: pcmk_quorum_notification: Noderhel7-02.alteeve.ca[2] - state is now lost (was member)Dec 21 14:36:09 rhel7-01 crmd[1730]: notice: do_state_transition: Statetransition S_ELECTION -> S_INTEGRATION [ input=I_ELECTION_DCcause=C_FSA_INTERNAL origin=do_election_check ]Dec 21 14:36:10 rhel7-01 attrd[1728]: notice: attrd_local_callback:Sending full refresh (origin=crmd)Dec 21 14:36:10 rhel7-01 attrd[1728]: notice: attrd_trigger_update:Sending flush op to all hosts for: probe_complete (true)Dec 21 14:36:11 rhel7-01 pengine[1729]: notice: unpack_config: On lossof CCM Quorum: IgnoreDec 21 14:36:11 rhel7-01 pengine[1729]: warning: pe_fence_node: Noderhel7-02.alteeve.ca will be fenced because fence_n02_virsh is thought tobe active thereDec 21 14:36:11 rhel7-01 pengine[1729]: warning: custom_action: Actionfence_n02_virsh_stop_0 on rhel7-02.alteeve.ca is unrunnable (offline)Dec 21 14:36:11 rhel7-01 pengine[1729]: warning: stage6: Scheduling Noderhel7-02.alteeve.ca for STONITHDec 21 14:36:11 rhel7-01 pengine[1729]: notice: LogActions: Movefence_n02_virsh (Started rhel7-02.alteeve.ca -> rhel7-01.alteeve.ca)Dec 21 14:36:11 rhel7-01 pengine[1729]: warning: process_pe_message:Calculated Transition 0: /var/lib/pacemaker/pengine/pe-warn-2.bz2Dec 21 14:36:11 rhel7-01 crmd[1730]: notice: te_fence_node: Executingreboot fencing operation (11) on rhel7-02.alteeve.ca (timeout=60000)Dec 21 14:36:11 rhel7-01 stonith-ng[1726]: notice: handle_request:Client crmd.1730.4f6ea9e1 wants to fence (reboot) 'rhel7-02.alteeve.ca'with device '(any)'Dec 21 14:36:11 rhel7-01 stonith-ng[1726]: notice:initiate_remote_stonith_op: Initiating remote operation reboot forrhel7-02.alteeve.ca: ea720bbf-aeab-43bb-a196-3a4c091dea75 (0)Dec 21 14:36:11 rhel7-01 stonith-ng[1726]: notice:can_fence_host_with_device: fence_n01_virsh can not fencerhel7-02.alteeve.ca: static-listDec 21 14:36:11 rhel7-01 stonith-ng[1726]: notice:can_fence_host_with_device: fence_n02_virsh can not fencerhel7-02.alteeve.ca: static-listDec 21 14:36:11 rhel7-01 stonith-ng[1726]: error: remote_op_done:Operation reboot of rhel7-02.alteeve.ca by rhel7-01.alteeve.ca for[email protected]: No such deviceDec 21 14:36:11 rhel7-01 crmd[1730]: notice: tengine_stonith_callback:Stonith operation 2/11:0:0:52e1fdf2-0b3a-42be-b7df-4d9dadb8d98b: No suchdevice (-19)Dec 21 14:36:11 rhel7-01 crmd[1730]: notice: tengine_stonith_callback:Stonith operation 2 for rhel7-02.alteeve.ca failed (No such device):aborting transition.Dec 21 14:36:11 rhel7-01 crmd[1730]: notice: tengine_stonith_notify:Peer rhel7-02.alteeve.ca was not terminated (reboot) byrhel7-01.alteeve.ca for rhel7-01.alteeve.ca: No such device(ref=ea720bbf-aeab-43bb-a196-3a4c091dea75) by client crmd.1730Dec 21 14:36:11 rhel7-01 crmd[1730]: notice: run_graph: Transition 0(Complete=1, Pending=0, Fired=0, Skipped=5, Incomplete=0,Source=/var/lib/pacemaker/pengine/pe-warn-2.bz2): StoppedDec 21 14:36:11 rhel7-01 crmd[1730]: notice: too_many_st_failures: Nodevices found in cluster to fence rhel7-02.alteeve.ca, giving upDec 21 14:36:11 rhel7-01 crmd[1730]: notice: do_state_transition: Statetransition S_TRANSITION_ENGINE -> S_IDLE [ input=I_TE_SUCCESScause=C_FSA_INTERNAL origin=notify_crmd ]

====

I've tried with the full host names and with the short host names in'pcmk_host_list=', but the same result both times.


Versions:
====
pacemaker-1.1.10-19.el7.x86_64
pcs-0.9.99-2.el7.x86_64
====

Can someone hit me with a clustick?

--
Digimer
Papers and Projects: https://alteeve.ca/w/

What if the cure for cancer is trapped in the mind of a person withoutaccess to education?


_______________________________________________
Pacemaker mailing list: [email protected]
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

[Pacemaker] Problem with stonith in rhel7 + pacemaker 1.1.10 + fence_virsh

Reply via email to