Re: [Pacemaker] When stonith is enabled, resources won't start until after stonith, even though requires="nothing" and prereq="nothing" on RHEL 7 with pacemaker-1.1.11 compiled from source.

Andrew Beekhof Wed, 11 Jun 2014 17:23:27 -0700

On 12 Jun 2014, at 4:55 am, Paul E Cain <pec...@us.ibm.com> wrote:

> Hello,
> 
> Overview
> I'm experimenting with a small two-node Pacemaker cluster on two RHEL 7 VMs. 
> One of the things I need to do is ensure that my cluster can connect to a 
> certain IP address 10.10.0.1 because once I add the actual resources that 
> will need to be HA those resources will need access to 10.10.0.1 for the 
> cluster to functional normally. To do that, I have one ocf:pacemaker:ping 
> resource for each node to check that connectivity. If the ping fails, the 
> node should go into standby mode and get fenced if possible. Additionally, 
> when a node first comes up I want that connectivity check to happen before 
> the fencing agents come up or a STONITH happens because a node should not try 
> to take over cluster resources if it cannot connect to 10.10.0.1. To do this, 
> I tried adding requires="nothing" and prereq="nothing" to all the operations 
> for both pinging resources. I also have two meatware fencing agents to use 
> for testing. I'm using order constraints so they don't start until after the 
> ping resources. 
> 
> Cluster When Functioning Normally 
> [root@ha3 ~]# crm_mon -1
> Last updated: Wed Jun 11 13:10:54 2014
> Last change: Wed Jun 11 13:10:35 2014 via crmd on ha3
> Stack: corosync
> Current DC: ha3 (168427534) - partition with quorum
> Version: 1.1.10-9d39a6b
> 2 Nodes configured
> 4 Resources configured
> 
> 
> Online: [ ha3 ha4 ]
> 
>  ha3_fabric_ping      (ocf::pacemaker:ping):  Started ha3 
>  ha4_fabric_ping      (ocf::pacemaker:ping):  Started ha4 
>  fencing_route_to_ha3 (stonith:meatware):     Started ha4 
>  fencing_route_to_ha4 (stonith:meatware):     Started ha3 
> 
> 
> Testing
> However, when I tested this by only starting up pacemaker on ha3 and also 
> preventing ha3 from connecting to 10.10.0.1, I found that ha3 would not start 
> until after ha4 was STONITHed. What I was aiming for was for ha3_fabric_ping 
> to fail to start, which would prevent the fencing agent from starting and 
> therefore prevent any STONITH.
> 
> 
> Question
> Any ideas why this is not working as expected? It's my understanding that 
> requires="nothing" should allow ha3_fabric_ping to start even before any 
> fencing operations. Maybe I'm misunderstanding something.



Its because the entire node is in standby mode.
Running crm_simulate with the cib.xml below shows:

Node ha3 (168427534): standby (on-fail)

In the config I see:

          <op name="monitor" interval="15s" requires="nothing" 
on-fail="standby" timeout="15s" id="ha3_fabric_ping-monitor-15s">

and:

            <lrm_rsc_op id="ha3_fabric_ping_last_failure_0" 
operation_key="ha3_fabric_ping_start_0" operation="start" 
crm-debug-origin="do_update_resource" crm_feature_set="3.0.8" 
transition-key="4:1:0:0ebf14dc-cfcf-425a-a507-65ed0ee060aa" 
transition-magic="0:1;4:1:0:0ebf14dc-cfcf-425a-a507-65ed0ee060aa" call-id="18" 
rc-code="1" op-status="0" interval="0" last-run="1402509641" 
last-rc-change="1402509641" exec-time="20043" queue-time="0" 
op-digest="ddf4bee6852a62c7efcf52cf7471d629"/>

Note:  rc-code="1" 

The combination put the node into standby and prevented resources starting.

> 
> Thanks for any help you can offer.
> 
> Below is shows the software versions, cibadmin -Q, the /var/log/messages on 
> ha3 during my test, and my corosync.conf file.
> 
> Tell me if you need any more information.
> 
> Software Versions (All Compiled From Source From The Website of the 
> Respective Projects)
> Cluster glue 1.0.11
> libqb 0.17.0
> Corosync 2.3.3
> Pacemaker 1.1.11
> Resources Agents 3.9.5
> crmsh 2.0
> 
> cibadmin -Q
> <cib epoch="204" num_updates="18" admin_epoch="0" 
> validate-with="pacemaker-1.2" cib-last-written="Wed Jun 11 12:56:50 2014" 
> crm_feature_set="3.0.8" update-origin="ha3" update-client="crm_resource" 
> have-quorum="1" dc-uuid="168427534">
>   <configuration>
>     <crm_config>
>       <cluster_property_set id="cib-bootstrap-options">
>         <nvpair name="symmetric-cluster" value="true" 
> id="cib-bootstrap-options-symmetric-cluster"/>
>         <nvpair name="stonith-enabled" value="true" 
> id="cib-bootstrap-options-stonith-enabled"/>
>         <nvpair name="stonith-action" value="reboot" 
> id="cib-bootstrap-options-stonith-action"/>
>         <nvpair name="no-quorum-policy" value="ignore" 
> id="cib-bootstrap-options-no-quorum-policy"/>
>         <nvpair name="stop-orphan-resources" value="true" 
> id="cib-bootstrap-options-stop-orphan-resources"/>
>         <nvpair name="stop-orphan-actions" value="true" 
> id="cib-bootstrap-options-stop-orphan-actions"/>
>         <nvpair name="default-action-timeout" value="20s" 
> id="cib-bootstrap-options-default-action-timeout"/>
>         <nvpair id="cib-bootstrap-options-dc-version" name="dc-version" 
> value="1.1.10-9d39a6b"/>
>         <nvpair id="cib-bootstrap-options-cluster-infrastructure" 
> name="cluster-infrastructure" value="corosync"/>
>       </cluster_property_set>
>     </crm_config>
>     <nodes>
>       <node id="168427534" uname="ha3"/>
>       <node id="168427535" uname="ha4"/>
>     </nodes>
>     <resources>
>       <primitive id="ha3_fabric_ping" class="ocf" provider="pacemaker" 
> type="ping">
>         <instance_attributes id="ha3_fabric_ping-instance_attributes">
>           <nvpair name="host_list" value="10.10.0.1" 
> id="ha3_fabric_ping-instance_attributes-host_list"/>
>           <nvpair name="failure_score" value="1" 
> id="ha3_fabric_ping-instance_attributes-failure_score"/>
>         </instance_attributes>
>         <operations>
>           <op name="start" timeout="60s" requires="nothing" on-fail="standby" 
> interval="0" id="ha3_fabric_ping-start-0">
>             <instance_attributes 
> id="ha3_fabric_ping-start-0-instance_attributes">
>               <nvpair name="prereq" value="nothing" 
> id="ha3_fabric_ping-start-0-instance_attributes-prereq"/>
>             </instance_attributes>
>           </op>
>           <op name="monitor" interval="15s" requires="nothing" 
> on-fail="standby" timeout="15s" id="ha3_fabric_ping-monitor-15s">
>             <instance_attributes 
> id="ha3_fabric_ping-monitor-15s-instance_attributes">
>               <nvpair name="prereq" value="nothing" 
> id="ha3_fabric_ping-monitor-15s-instance_attributes-prereq"/>
>             </instance_attributes>
>           </op>
>           <op name="stop" on-fail="fence" requires="nothing" interval="0" 
> id="ha3_fabric_ping-stop-0">
>             <instance_attributes 
> id="ha3_fabric_ping-stop-0-instance_attributes">
>               <nvpair name="prereq" value="nothing" 
> id="ha3_fabric_ping-stop-0-instance_attributes-prereq"/>
>             </instance_attributes>
>           </op>
>         </operations>
>         <meta_attributes id="ha3_fabric_ping-meta_attributes">
>           <nvpair id="ha3_fabric_ping-meta_attributes-requires" 
> name="requires" value="nothing"/>
>         </meta_attributes>
>       </primitive>
>       <primitive id="ha4_fabric_ping" class="ocf" provider="pacemaker" 
> type="ping">
>         <instance_attributes id="ha4_fabric_ping-instance_attributes">
>           <nvpair name="host_list" value="10.10.0.1" 
> id="ha4_fabric_ping-instance_attributes-host_list"/>
>           <nvpair name="failure_score" value="1" 
> id="ha4_fabric_ping-instance_attributes-failure_score"/>
>         </instance_attributes>
>         <operations>
>           <op name="start" timeout="60s" requires="nothing" on-fail="standby" 
> interval="0" id="ha4_fabric_ping-start-0">
>             <instance_attributes 
> id="ha4_fabric_ping-start-0-instance_attributes">
>               <nvpair name="prereq" value="nothing" 
> id="ha4_fabric_ping-start-0-instance_attributes-prereq"/>
>             </instance_attributes>
>           </op>
>           <op name="monitor" interval="15s" requires="nothing" 
> on-fail="standby" timeout="15s" id="ha4_fabric_ping-monitor-15s">
>             <instance_attributes 
> id="ha4_fabric_ping-monitor-15s-instance_attributes">
>               <nvpair name="prereq" value="nothing" 
> id="ha4_fabric_ping-monitor-15s-instance_attributes-prereq"/>
>             </instance_attributes>
>           </op>
>           <op name="stop" on-fail="fence" requires="nothing" interval="0" 
> id="ha4_fabric_ping-stop-0">
>             <instance_attributes 
> id="ha4_fabric_ping-stop-0-instance_attributes">
>               <nvpair name="prereq" value="nothing" 
> id="ha4_fabric_ping-stop-0-instance_attributes-prereq"/>
>             </instance_attributes>
>           </op>
>         </operations>
>         <meta_attributes id="ha4_fabric_ping-meta_attributes">
>           <nvpair id="ha4_fabric_ping-meta_attributes-requires" 
> name="requires" value="nothing"/>
>         </meta_attributes>
>       </primitive>
>       <primitive id="fencing_route_to_ha3" class="stonith" type="meatware">
>         <instance_attributes id="fencing_route_to_ha3-instance_attributes">
>           <nvpair name="hostlist" value="ha3" 
> id="fencing_route_to_ha3-instance_attributes-hostlist"/>
>         </instance_attributes>
>         <operations>
>           <op name="start" requires="nothing" interval="0" 
> id="fencing_route_to_ha3-start-0">
>             <instance_attributes 
> id="fencing_route_to_ha3-start-0-instance_attributes">
>               <nvpair name="prereq" value="nothing" 
> id="fencing_route_to_ha3-start-0-instance_attributes-prereq"/>
>             </instance_attributes>
>           </op>
>           <op name="monitor" requires="nothing" interval="0" 
> id="fencing_route_to_ha3-monitor-0">
>             <instance_attributes 
> id="fencing_route_to_ha3-monitor-0-instance_attributes">
>               <nvpair name="prereq" value="nothing" 
> id="fencing_route_to_ha3-monitor-0-instance_attributes-prereq"/>
>             </instance_attributes>
>           </op>
>         </operations>
>       </primitive>
>       <primitive id="fencing_route_to_ha4" class="stonith" type="meatware">
>         <instance_attributes id="fencing_route_to_ha4-instance_attributes">
>           <nvpair name="hostlist" value="ha4" 
> id="fencing_route_to_ha4-instance_attributes-hostlist"/>
>         </instance_attributes>
>         <operations>
>           <op name="start" requires="nothing" interval="0" 
> id="fencing_route_to_ha4-start-0">
>             <instance_attributes 
> id="fencing_route_to_ha4-start-0-instance_attributes">
>               <nvpair name="prereq" value="nothing" 
> id="fencing_route_to_ha4-start-0-instance_attributes-prereq"/>
>             </instance_attributes>
>           </op>
>           <op name="monitor" requires="nothing" interval="0" 
> id="fencing_route_to_ha4-monitor-0">
>             <instance_attributes 
> id="fencing_route_to_ha4-monitor-0-instance_attributes">
>               <nvpair name="prereq" value="nothing" 
> id="fencing_route_to_ha4-monitor-0-instance_attributes-prereq"/>
>             </instance_attributes>
>           </op>
>         </operations>
>       </primitive>
>     </resources>
>     <constraints>
>       <rsc_location id="ha3_fabric_ping_location" rsc="ha3_fabric_ping" 
> score="INFINITY" node="ha3"/>
>       <rsc_location id="ha3_fabric_ping_not_location" rsc="ha3_fabric_ping" 
> score="-INFINITY" node="ha4"/>
>       <rsc_location id="ha4_fabric_ping_location" rsc="ha4_fabric_ping" 
> score="INFINITY" node="ha4"/>
>       <rsc_location id="ha4_fabric_ping_not_location" rsc="ha4_fabric_ping" 
> score="-INFINITY" node="ha3"/>
>       <rsc_location id="fencing_route_to_ha4_location" 
> rsc="fencing_route_to_ha4" score="INFINITY" node="ha3"/>
>       <rsc_location id="fencing_route_to_ha4_not_location" 
> rsc="fencing_route_to_ha4" score="-INFINITY" node="ha4"/>
>       <rsc_location id="fencing_route_to_ha3_location" 
> rsc="fencing_route_to_ha3" score="INFINITY" node="ha4"/>
>       <rsc_location id="fencing_route_to_ha3_not_location" 
> rsc="fencing_route_to_ha3" score="-INFINITY" node="ha3"/>
>       <rsc_order id="ha3_fabric_ping_before_fencing_route_to_ha4" 
> score="INFINITY" first="ha3_fabric_ping" first-action="start" 
> then="fencing_route_to_ha4" then-action="start"/>
>       <rsc_order id="ha4_fabric_ping_before_fencing_route_to_ha3" 
> score="INFINITY" first="ha4_fabric_ping" first-action="start" 
> then="fencing_route_to_ha3" then-action="start"/>
>     </constraints>
>     <rsc_defaults>
>       <meta_attributes id="rsc-options">
>         <nvpair name="resource-stickiness" value="INFINITY" 
> id="rsc-options-resource-stickiness"/>
>         <nvpair name="migration-threshold" value="0" 
> id="rsc-options-migration-threshold"/>
>         <nvpair name="is-managed" value="true" id="rsc-options-is-managed"/>
>       </meta_attributes>
>     </rsc_defaults>
>   </configuration>
>   <status>
>     <node_state id="168427534" uname="ha3" in_ccm="true" crmd="online" 
> crm-debug-origin="do_update_resource" join="member" expected="member">
>       <lrm id="168427534">
>         <lrm_resources>
>           <lrm_resource id="ha3_fabric_ping" type="ping" class="ocf" 
> provider="pacemaker">
>             <lrm_rsc_op id="ha3_fabric_ping_last_0" 
> operation_key="ha3_fabric_ping_stop_0" operation="stop" 
> crm-debug-origin="do_update_resource" crm_feature_set="3.0.8" 
> transition-key="4:3:0:0ebf14dc-cfcf-425a-a507-65ed0ee060aa" 
> transition-magic="0:0;4:3:0:0ebf14dc-cfcf-425a-a507-65ed0ee060aa" 
> call-id="19" rc-code="0" op-status="0" interval="0" last-run="1402509661" 
> last-rc-change="1402509661" exec-time="12" queue-time="0" 
> op-digest="91b00b3fe95f23582466d18e42c4fd58"/>
>             <lrm_rsc_op id="ha3_fabric_ping_last_failure_0" 
> operation_key="ha3_fabric_ping_start_0" operation="start" 
> crm-debug-origin="do_update_resource" crm_feature_set="3.0.8" 
> transition-key="4:1:0:0ebf14dc-cfcf-425a-a507-65ed0ee060aa" 
> transition-magic="0:1;4:1:0:0ebf14dc-cfcf-425a-a507-65ed0ee060aa" 
> call-id="18" rc-code="1" op-status="0" interval="0" last-run="1402509641" 
> last-rc-change="1402509641" exec-time="20043" queue-time="0" 
> op-digest="ddf4bee6852a62c7efcf52cf7471d629"/>
>           </lrm_resource>
>           <lrm_resource id="ha4_fabric_ping" type="ping" class="ocf" 
> provider="pacemaker">
>             <lrm_rsc_op id="ha4_fabric_ping_last_0" 
> operation_key="ha4_fabric_ping_monitor_0" operation="monitor" 
> crm-debug-origin="do_update_resource" crm_feature_set="3.0.8" 
> transition-key="5:0:7:0ebf14dc-cfcf-425a-a507-65ed0ee060aa" 
> transition-magic="0:7;5:0:7:0ebf14dc-cfcf-425a-a507-65ed0ee060aa" call-id="9" 
> rc-code="7" op-status="0" interval="0" last-run="1402509565" 
> last-rc-change="1402509565" exec-time="10" queue-time="0" 
> op-digest="91b00b3fe95f23582466d18e42c4fd58"/>
>           </lrm_resource>
>           <lrm_resource id="fencing_route_to_ha3" type="meatware" 
> class="stonith">
>             <lrm_rsc_op id="fencing_route_to_ha3_last_0" 
> operation_key="fencing_route_to_ha3_monitor_0" operation="monitor" 
> crm-debug-origin="do_update_resource" crm_feature_set="3.0.8" 
> transition-key="6:0:7:0ebf14dc-cfcf-425a-a507-65ed0ee060aa" 
> transition-magic="0:7;6:0:7:0ebf14dc-cfcf-425a-a507-65ed0ee060aa" 
> call-id="13" rc-code="7" op-status="0" interval="0" last-run="1402509565" 
> last-rc-change="1402509565" exec-time="1" queue-time="0" 
> op-digest="502fbd7a2366c2be772d7fbecc9e0351"/>
>           </lrm_resource>
>           <lrm_resource id="fencing_route_to_ha4" type="meatware" 
> class="stonith">
>             <lrm_rsc_op id="fencing_route_to_ha4_last_0" 
> operation_key="fencing_route_to_ha4_monitor_0" operation="monitor" 
> crm-debug-origin="do_update_resource" crm_feature_set="3.0.8" 
> transition-key="7:0:7:0ebf14dc-cfcf-425a-a507-65ed0ee060aa" 
> transition-magic="0:7;7:0:7:0ebf14dc-cfcf-425a-a507-65ed0ee060aa" 
> call-id="17" rc-code="7" op-status="0" interval="0" last-run="1402509565" 
> last-rc-change="1402509565" exec-time="0" queue-time="0" 
> op-digest="5be26fbcfd648e3d545d0115645dde76"/>
>           </lrm_resource>
>         </lrm_resources>
>       </lrm>
>       <transient_attributes id="168427534">
>         <instance_attributes id="status-168427534">
>           <nvpair id="status-168427534-shutdown" name="shutdown" value="0"/>
>           <nvpair id="status-168427534-probe_complete" name="probe_complete" 
> value="true"/>
>           <nvpair id="status-168427534-fail-count-ha3_fabric_ping" 
> name="fail-count-ha3_fabric_ping" value="INFINITY"/>
>           <nvpair id="status-168427534-last-failure-ha3_fabric_ping" 
> name="last-failure-ha3_fabric_ping" value="1402509661"/>
>         </instance_attributes>
>       </transient_attributes>
>     </node_state>
>     <node_state id="168427535" in_ccm="false" crmd="offline" join="down" 
> crm-debug-origin="send_stonith_update" uname="ha4" expected="down"/>
>   </status>
> </cib>
> [root@ha3 ~]# 
> 
> 
> /var/log/messages from when pacemaker started on ha3 to when ha3_fabric_ping 
> failed.
> Jun 11 12:59:01 ha3 systemd: Starting LSB: Starts and stops Pacemaker Cluster 
> Manager....
> Jun 11 12:59:01 ha3 pacemaker: Starting Pacemaker Cluster Manager
> Jun 11 12:59:01 ha3 pacemakerd[5007]: notice: mcp_read_config: Configured 
> corosync to accept connections from group 1000: OK (1)
> Jun 11 12:59:01 ha3 pacemakerd[5007]: notice: main: Starting Pacemaker 1.1.10 
> (Build: 9d39a6b):  agent-manpages ncurses libqb-logging libqb-ipc lha-fencing 
> nagios  corosync-native libesmtp
> Jun 11 12:59:01 ha3 pacemakerd[5007]: notice: corosync_node_name: Unable to 
> get node name for nodeid 168427534
> Jun 11 12:59:01 ha3 pacemakerd[5007]: notice: get_node_name: Could not obtain 
> a node name for corosync nodeid 168427534
> Jun 11 12:59:01 ha3 pacemakerd[5007]: notice: cluster_connect_quorum: Quorum 
> acquired
> Jun 11 12:59:01 ha3 pacemakerd[5007]: notice: corosync_node_name: Unable to 
> get node name for nodeid 168427534
> Jun 11 12:59:01 ha3 pacemakerd[5007]: notice: get_node_name: Defaulting to 
> uname -n for the local corosync node name
> Jun 11 12:59:01 ha3 pacemakerd[5007]: notice: crm_update_peer_state: 
> pcmk_quorum_notification: Node ha3[168427534] - state is now member (was 
> (null))
> Jun 11 12:59:01 ha3 pacemakerd[5007]: notice: corosync_node_name: Unable to 
> get node name for nodeid 168427535
> Jun 11 12:59:01 ha3 pacemakerd[5007]: notice: get_node_name: Could not obtain 
> a node name for corosync nodeid 168427535
> Jun 11 12:59:01 ha3 pacemakerd[5007]: notice: corosync_node_name: Unable to 
> get node name for nodeid 168427535
> Jun 11 12:59:01 ha3 pacemakerd[5007]: notice: corosync_node_name: Unable to 
> get node name for nodeid 168427535
> Jun 11 12:59:01 ha3 pacemakerd[5007]: notice: get_node_name: Could not obtain 
> a node name for corosync nodeid 168427535
> Jun 11 12:59:01 ha3 pacemakerd[5007]: notice: crm_update_peer_state: 
> pcmk_quorum_notification: Node (null)[168427535] - state is now member (was 
> (null))
> Jun 11 12:59:02 ha3 pengine[5013]: warning: crm_is_writable: 
> /var/lib/pacemaker/pengine should be owned and r/w by group haclient
> Jun 11 12:59:02 ha3 cib[5009]: warning: crm_is_writable: 
> /var/lib/pacemaker/cib should be owned and r/w by group haclient
> Jun 11 12:59:02 ha3 cib[5009]: notice: crm_cluster_connect: Connecting to 
> cluster infrastructure: corosync
> Jun 11 12:59:02 ha3 stonith-ng[5010]: notice: crm_cluster_connect: Connecting 
> to cluster infrastructure: corosync
> Jun 11 12:59:02 ha3 crmd[5014]: notice: main: CRM Git Version: 9d39a6b
> Jun 11 12:59:02 ha3 crmd[5014]: warning: crm_is_writable: 
> /var/lib/pacemaker/pengine should be owned and r/w by group haclient
> Jun 11 12:59:02 ha3 crmd[5014]: warning: crm_is_writable: 
> /var/lib/pacemaker/cib should be owned and r/w by group haclient
> Jun 11 12:59:02 ha3 attrd[5012]: notice: crm_cluster_connect: Connecting to 
> cluster infrastructure: corosync
> Jun 11 12:59:02 ha3 attrd[5012]: notice: corosync_node_name: Unable to get 
> node name for nodeid 168427534
> Jun 11 12:59:02 ha3 attrd[5012]: notice: get_node_name: Could not obtain a 
> node name for corosync nodeid 168427534
> Jun 11 12:59:02 ha3 attrd[5012]: notice: crm_update_peer_state: 
> attrd_peer_change_cb: Node (null)[168427534] - state is now member (was 
> (null))
> Jun 11 12:59:02 ha3 attrd[5012]: notice: corosync_node_name: Unable to get 
> node name for nodeid 168427534
> Jun 11 12:59:02 ha3 attrd[5012]: notice: get_node_name: Defaulting to uname 
> -n for the local corosync node name
> Jun 11 12:59:02 ha3 stonith-ng[5010]: notice: corosync_node_name: Unable to 
> get node name for nodeid 168427534
> Jun 11 12:59:02 ha3 stonith-ng[5010]: notice: get_node_name: Could not obtain 
> a node name for corosync nodeid 168427534
> Jun 11 12:59:02 ha3 stonith-ng[5010]: notice: corosync_node_name: Unable to 
> get node name for nodeid 168427534
> Jun 11 12:59:02 ha3 stonith-ng[5010]: notice: get_node_name: Defaulting to 
> uname -n for the local corosync node name
> Jun 11 12:59:02 ha3 cib[5009]: notice: corosync_node_name: Unable to get node 
> name for nodeid 168427534
> Jun 11 12:59:02 ha3 cib[5009]: notice: get_node_name: Could not obtain a node 
> name for corosync nodeid 168427534
> Jun 11 12:59:02 ha3 cib[5009]: notice: corosync_node_name: Unable to get node 
> name for nodeid 168427534
> Jun 11 12:59:02 ha3 cib[5009]: notice: get_node_name: Defaulting to uname -n 
> for the local corosync node name
> Jun 11 12:59:03 ha3 crmd[5014]: notice: crm_cluster_connect: Connecting to 
> cluster infrastructure: corosync
> Jun 11 12:59:03 ha3 crmd[5014]: notice: corosync_node_name: Unable to get 
> node name for nodeid 168427534
> Jun 11 12:59:03 ha3 crmd[5014]: notice: get_node_name: Could not obtain a 
> node name for corosync nodeid 168427534
> Jun 11 12:59:03 ha3 stonith-ng[5010]: notice: setup_cib: Watching for stonith 
> topology changes
> Jun 11 12:59:03 ha3 stonith-ng[5010]: notice: unpack_config: On loss of CCM 
> Quorum: Ignore
> Jun 11 12:59:03 ha3 crmd[5014]: notice: corosync_node_name: Unable to get 
> node name for nodeid 168427534
> Jun 11 12:59:03 ha3 crmd[5014]: notice: get_node_name: Defaulting to uname -n 
> for the local corosync node name
> Jun 11 12:59:03 ha3 crmd[5014]: notice: cluster_connect_quorum: Quorum 
> acquired
> Jun 11 12:59:03 ha3 crmd[5014]: notice: crm_update_peer_state: 
> pcmk_quorum_notification: Node ha3[168427534] - state is now member (was 
> (null))
> Jun 11 12:59:03 ha3 crmd[5014]: notice: corosync_node_name: Unable to get 
> node name for nodeid 168427535
> Jun 11 12:59:03 ha3 crmd[5014]: notice: get_node_name: Could not obtain a 
> node name for corosync nodeid 168427535
> Jun 11 12:59:03 ha3 crmd[5014]: notice: corosync_node_name: Unable to get 
> node name for nodeid 168427535
> Jun 11 12:59:03 ha3 crmd[5014]: notice: corosync_node_name: Unable to get 
> node name for nodeid 168427535
> Jun 11 12:59:03 ha3 crmd[5014]: notice: get_node_name: Could not obtain a 
> node name for corosync nodeid 168427535
> Jun 11 12:59:03 ha3 crmd[5014]: notice: crm_update_peer_state: 
> pcmk_quorum_notification: Node (null)[168427535] - state is now member (was 
> (null))
> Jun 11 12:59:03 ha3 crmd[5014]: notice: corosync_node_name: Unable to get 
> node name for nodeid 168427534
> Jun 11 12:59:03 ha3 crmd[5014]: notice: get_node_name: Defaulting to uname -n 
> for the local corosync node name
> Jun 11 12:59:03 ha3 crmd[5014]: notice: do_started: The local CRM is 
> operational
> Jun 11 12:59:03 ha3 crmd[5014]: notice: do_state_transition: State transition 
> S_STARTING -> S_PENDING [ input=I_PENDING cause=C_FSA_INTERNAL 
> origin=do_started ]
> Jun 11 12:59:04 ha3 stonith-ng[5010]: notice: stonith_device_register: Added 
> 'fencing_route_to_ha4' to the device list (1 active devices)
> Jun 11 12:59:06 ha3 pacemaker: Starting Pacemaker Cluster Manager[  OK  ]
> Jun 11 12:59:06 ha3 systemd: Started LSB: Starts and stops Pacemaker Cluster 
> Manager..
> Jun 11 12:59:24 ha3 crmd[5014]: warning: do_log: FSA: Input I_DC_TIMEOUT from 
> crm_timer_popped() received in state S_PENDING
> Jun 11 12:59:24 ha3 crmd[5014]: notice: do_state_transition: State transition 
> S_ELECTION -> S_INTEGRATION [ input=I_ELECTION_DC cause=C_TIMER_POPPED 
> origin=election_timeout_popped ]
> Jun 11 12:59:24 ha3 crmd[5014]: warning: do_log: FSA: Input I_ELECTION_DC 
> from do_election_check() received in state S_INTEGRATION
> Jun 11 12:59:24 ha3 cib[5009]: notice: corosync_node_name: Unable to get node 
> name for nodeid 168427534
> Jun 11 12:59:24 ha3 cib[5009]: notice: get_node_name: Defaulting to uname -n 
> for the local corosync node name
> Jun 11 12:59:24 ha3 attrd[5012]: notice: corosync_node_name: Unable to get 
> node name for nodeid 168427534
> Jun 11 12:59:24 ha3 attrd[5012]: notice: get_node_name: Defaulting to uname 
> -n for the local corosync node name
> Jun 11 12:59:24 ha3 attrd[5012]: notice: write_attribute: Sent update 2 with 
> 1 changes for terminate, id=<n/a>, set=(null)
> Jun 11 12:59:24 ha3 attrd[5012]: notice: write_attribute: Sent update 3 with 
> 1 changes for shutdown, id=<n/a>, set=(null)
> Jun 11 12:59:24 ha3 attrd[5012]: notice: attrd_cib_callback: Update 2 for 
> terminate[ha3]=(null): OK (0)
> Jun 11 12:59:24 ha3 attrd[5012]: notice: attrd_cib_callback: Update 3 for 
> shutdown[ha3]=0: OK (0)
> Jun 11 12:59:25 ha3 pengine[5013]: notice: unpack_config: On loss of CCM 
> Quorum: Ignore
> Jun 11 12:59:25 ha3 pengine[5013]: warning: stage6: Scheduling Node ha4 for 
> STONITH
> Jun 11 12:59:25 ha3 pengine[5013]: notice: LogActions: Start   
> ha3_fabric_ping        (ha3)
> Jun 11 12:59:25 ha3 pengine[5013]: notice: LogActions: Start   
> fencing_route_to_ha4   (ha3)
> Jun 11 12:59:25 ha3 pengine[5013]: warning: process_pe_message: Calc ulated 
> Transition 0: /var/lib/pacemaker/pengine/pe-warn-80.bz2
> Jun 11 12:59:25 ha3 crmd[5014]: notice: te_rsc_command: Initiating action 4: 
> monitor ha3_fabric_ping_monitor_0 on ha3 (local)
> Jun 11 12:59:25 ha3 crmd[5014]: notice: te_fence_node: Executing reboot 
> fencing operation (12) on ha4 (timeout=60000)
> Jun 11 12:59:25 ha3 stonith-ng[5010]: notice: handle_request: Client 
> crmd.5014.dbbbf194 wants to fence (reboot) 'ha4' with device '(any)'
> Jun 11 12:59:25 ha3 stonith-ng[5010]: notice: initiate_remote_stonith_op: 
> Initiating remote operation reboot for ha4: 
> b3ab6141-9612-4024-82b2-350e74bbb33d (0)
> Jun 11 12:59:25 ha3 stonith-ng[5010]: notice: corosync_node_name: Unable to 
> get node name for nodeid 168427534
> Jun 11 12:59:25 ha3 stonith-ng[5010]: notice: get_node_name: Defaulting to 
> uname -n for the local corosync node name
> Jun 11 12:59:25 ha3 stonith: [5027]: info: parse config info info=ha4
> Jun 11 12:59:25 ha3 stonith-ng[5010]: notice: can_fence_host_with_device: 
> fencing_route_to_ha4 can fence ha4: dynamic-list
> Jun 11 12:59:25 ha3 stonith: [5031]: info: parse config info info=ha4
> Jun 11 12:59:25 ha3 stonith: [5031]: CRIT: OPERATOR INTERVENTION REQUIRED to 
> reset ha4.
> Jun 11 12:59:25 ha3 stonith: [5031]: CRIT: Run "meatclient -c ha4" AFTER 
> power-cycling the machine.
> Jun 11 12:59:25 ha3 crmd[5014]: notice: process_lrm_event: LRM operation 
> ha3_fabric_ping_monitor_0 (call=5, rc=7, cib-update=25, confirmed=true) not 
> running
> Jun 11 12:59:25 ha3 crmd[5014]: notice: te_rsc_command: Initiating action 5: 
> monitor ha4_fabric_ping_monitor_0 on ha3 (local)
> Jun 11 12:59:25 ha3 crmd[5014]: notice: process_lrm_event: LRM operation 
> ha4_fabric_ping_monitor_0 (call=9, rc=7, cib-update=26, confirmed=true) not 
> running
> Jun 11 12:59:25 ha3 crmd[5014]: notice: te_rsc_command: Initiating action 6: 
> monitor fencing_route_to_ha3_monitor_0 on ha3 (local)
> Jun 11 12:59:25 ha3 crmd[5014]: notice: te_rsc_command: Initiating action 7: 
> monitor fencing_route_to_ha4_monitor_0 on ha3 (local)
> Jun 11 12:59:25 ha3 crmd[5014]: notice: te_rsc_command: Initiating action 3: 
> probe_complete probe_complete on ha3 (local) - no waiting
> Jun 11 12:59:25 ha3 attrd[5012]: notice: write_attribute: Sent update 4 with 
> 1 changes for probe_complete, id=<n/a>, set=(null)
> Jun 11 12:59:25 ha3 attrd[5012]: notice: attrd_cib_callback: Update 4 for 
> probe_complete[ha3]=true: OK (0)
> Jun 11 13:00:25 ha3 stonith-ng[5010]: notice: stonith_action_async_done: 
> Child process 5030 performing action 'reboot' timed out with signal 15
> Jun 11 13:00:25 ha3 stonith-ng[5010]: error: log_operation: Operation 
> 'reboot' [5030] (call 2 from crmd.5014) for host 'ha4' with device 
> 'fencing_route_to_ha4' returned: -62 (Timer expired)
> Jun 11 13:00:25 ha3 stonith-ng[5010]: warning: log_operation: 
> fencing_route_to_ha4:5030 [ Performing: stonith -t meatware -T reset ha4 ]
> Jun 11 13:00:25 ha3 stonith-ng[5010]: notice: stonith_choose_peer: Couldn't 
> find anyone to fence ha4 with <any>
> Jun 11 13:00:25 ha3 stonith-ng[5010]: error: remote_op_done: Operation reboot 
> of ha4 by ha3 for crmd.5014@ha3.b3ab6141: No route to host
> Jun 11 13:00:25 ha3 crmd[5014]: notice: tengine_stonith_callback: Stonith 
> operation 2/12:0:0:0ebf14dc-cfcf-425a-a507-65ed0ee060aa: No route to host 
> (-113)
> Jun 11 13:00:25 ha3 crmd[5014]: notice: tengine_stonith_callback: Stonith 
> operation 2 for ha4 failed (No route to host): aborting transition.
> Jun 11 13:00:25 ha3 crmd[5014]: notice: tengine_stonith_notify: Peer ha4 was 
> not terminated (reboot) by ha3 for ha3: No route to host 
> (ref=b3ab6141-9612-4024-82b2-350e74bbb33d) by client crmd.5014
> Jun 11 13:00:25 ha3 crmd[5014]: notice: run_graph: Transition 0 (Complete=7, 
> Pending=0, Fired=0, Skipped=5, Incomplete=0, 
> Source=/var/lib/pacemaker/pengine/pe-warn-80.bz2): Stopped
> Jun 11 13:00:25 ha3 pengine[5013]: notice: unpack_config: On loss of CCM 
> Quorum: Ignore
> Jun 11 13:00:25 ha3 pengine[5013]: warning: stage6: Scheduling Node ha4 for 
> STONITH
> Jun 11 13:00:25 ha3 pengine[5013]: notice: LogActions: Start   
> ha3_fabric_ping        (ha3)
> Jun 11 13:00:25 ha3 pengine[5013]: notice: LogActions: Start   
> fencing_route_to_ha4   (ha3)
> Jun 11 13:00:25 ha3 pengine[5013]: warning: process_pe_message: Calculated 
> Transition 1: /var/lib/pacemaker/pengine/pe-warn-81.bz2
> Jun 11 13:00:25 ha3 crmd[5014]: notice: te_fence_node: Executing reboot 
> fencing operation (8) on ha4 (timeout=60000)
> Jun 11 13:00:25 ha3 stonith-ng[5010]: notice: handle_request: Client 
> crmd.5014.dbbbf194 wants to fence (reboot) 'ha4' with device '(any)'
> Jun 11 13:00:25 ha3 stonith-ng[5010]: notice: initiate_remote_stonith_op: 
> Initiating remote operation reboot for ha4: 
> eae78d4c-8d80-47fe-93e9-1a9261ec38a4 (0)
> Jun 11 13:00:25 ha3 stonith-ng[5010]: notice: can_fence_host_with_device: 
> fencing_route_to_ha4 can fence ha4: dynamic-list
> Jun 11 13:00:25 ha3 stonith-ng[5010]: notice: can_fence_host_with_device: 
> fencing_route_to_ha4 can fence ha4: dynamic-list
> Jun 11 13:00:25 ha3 stonith: [5057]: info: parse config info info=ha4
> Jun 11 13:00:25 ha3 stonith: [5057]: CRIT: OPERATOR INTERVENTION REQUIRED to 
> reset ha4.
> Jun 11 13:00:25 ha3 stonith: [5057]: CRIT: Run "meatclient -c ha4" AFTER 
> power-cycling the machine.
> Jun 11 13:00:41 ha3 stonith: [5057]: info: node Meatware-reset: ha4
> Jun 11 13:00:41 ha3 stonith-ng[5010]: notice: log_operation: Operation 
> 'reboot' [5056] (call 3 from crmd.5014) for host 'ha4' with device 
> 'fencing_route_to_ha4' returned: 0 (OK)
> Jun 11 13:00:41 ha3 stonith-ng[5010]: notice: remote_op_done: Operation 
> reboot of ha4 by ha3 for crmd.5014@ha3.eae78d4c: OK
> Jun 11 13:00:41 ha3 crmd[5014]: notice: tengine_stonith_callback: Stonith 
> operation 3/8:1:0:0ebf14dc-cfcf-425a-a507-65ed0ee060aa: OK (0)
> Jun 11 13:00:41 ha3 crmd[5014]: notice: crm_update_peer_state: 
> send_stonith_update: Node ha4[0] - state is now lost (was (null))
> Jun 11 13:00:41 ha3 crmd[5014]: notice: tengine_stonith_notify: Peer ha4 was 
> terminated (reboot) by ha3 for ha3: OK 
> (ref=eae78d4c-8d80-47fe-93e9-1a9261ec38a4) by client crmd.5014
> Jun 11 13:00:41 ha3 crmd[5014]: notice: te_rsc_command: Initiating action 4: 
> start ha3_fabric_ping_start_0 on ha3 (local)
> Jun 11 13:01:01 ha3 systemd: Starting Session 22 of user root.
> Jun 11 13:01:01 ha3 systemd: Started Session 22 of user root.
> Jun 11 13:01:01 ha3 attrd[5012]: notice: write_attribute: Sent update 5 with 
> 1 changes for pingd, id=<n/a>, set=(null)
> Jun 11 13:01:01 ha3 attrd[5012]: notice: attrd_cib_callback: Update 5 for 
> pingd[ha3]=0: OK (0)
> Jun 11 13:01:01 ha3 ping(ha3_fabric_ping)[5060]: WARNING: pingd is less than 
> failure_score(1)
> Jun 11 13:01:01 ha3 crmd[5014]: notice: process_lrm_event: LRM operation 
> ha3_fabric_ping_start_0 (call=18, rc=1, cib-update=37, confirmed=true) 
> unknown error
> Jun 11 13:01:01 ha3 crmd[5014]: warning: status_from_rc: Action 4 
> (ha3_fabric_ping_start_0) on ha3 failed (target: 0 vs. rc: 1): Error
> Jun 11 13:01:01 ha3 crmd[5014]: warning: update_failcount: Updating failcount 
> for ha3_fabric_ping on ha3 after failed start: rc=1 (update=INFINITY, 
> time=1402509661)
> Jun 11 13:01:01 ha3 crmd[5014]: warning: update_failcount: Updating failcount 
> for ha3_fabric_ping on ha3 after failed start: rc=1 (update=INFINITY, 
> time=1402509661)
> Jun 11 13:01:01 ha3 crmd[5014]: notice: run_graph: Transition 1 (Complete=4, 
> Pending=0, Fired=0, Skipped=2, Incomplete=0, 
> Source=/var/lib/pacemaker/pengine/pe-warn-81.bz2): Stopped
> Jun 11 13:01:01 ha3 attrd[5012]: notice: write_attribute: Sent update 6 with 
> 1 changes for fail-count-ha3_fabric_ping, id=<n/a>, set=(null)
> Jun 11 13:01:01 ha3 attrd[5012]: notice: write_attribute: Sent update 7 with 
> 1 changes for last-failure-ha3_fabric_ping, id=<n/a>, set=(null)
> Jun 11 13:01:01 ha3 pengine[5013]: notice: unpack_config: On loss of CCM 
> Quorum: Ignore
> Jun 11 13:01:01 ha3 pengine[5013]: warning: unpack_rsc_op_failure: Processing 
> failed op start for ha3_fabric_ping on ha3: unknown error (1)
> Jun 11 13:01:01 ha3 pengine[5013]: notice: LogActions: Stop    
> ha3_fabric_ping        (ha3)
> Jun 11 13:01:01 ha3 pengine[5013]: notice: process_pe_message: Calculated 
> Transition 2: /var/lib/pacemaker/pengine/pe-input-304.bz2
> Jun 11 13:01:01 ha3 attrd[5012]: notice: attrd_cib_callback: Update 6 for 
> fail-count-ha3_fabric_ping[ha3]=INFINITY: OK (0)
> Jun 11 13:01:01 ha3 attrd[5012]: notice: attrd_cib_callback: Update 7 for 
> last-failure-ha3_fabric_ping[ha3]=1402509661: OK (0)
> Jun 11 13:01:01 ha3 pengine[5013]: notice: unpack_config: On loss of CCM 
> Quorum: Ignore
> Jun 11 13:01:01 ha3 pengine[5013]: warning: unpack_rsc_op_failure: Processing 
> failed op start for ha3_fabric_ping on ha3: unknown error (1)
> Jun 11 13:01:01 ha3 pengine[5013]: notice: LogActions: Stop    
> ha3_fabric_ping        (ha3)
> Jun 11 13:01:01 ha3 pengine[5013]: notice: process_pe_message: Calculated 
> Transition 3: /var/lib/pacemaker/pengine/pe-input-305.bz2
> Jun 11 13:01:01 ha3 crmd[5014]: notice: te_rsc_command: Initiating action 4: 
> stop ha3_fabric_ping_stop_0 on ha3 (local)
> Jun 11 13:01:01 ha3 crmd[5014]: notice: process_lrm_event: LRM operation 
> ha3_fabric_ping_stop_0 (call=19, rc=0, cib-update=41, confirmed=true) ok
> Jun 11 13:01:01 ha3 crmd[5014]: notice: run_graph: Transition 3 (Complete=2, 
> Pending=0, Fired=0, Skipped=0, Incomplete=0, 
> Source=/var/lib/pacemaker/pengine/pe-input-305.bz2): Complete
> Jun 11 13:01:01 ha3 crmd[5014]: notice: do_state_transition: State transition 
> S_TRANSITION_ENGINE -> S_IDLE [ input=I_TE_SUCCESS cause=C_FSA_INTERNAL 
> origin=notify_crmd ]
> Jun 11 13:01:06 ha3 attrd[5012]: notice: write_attribute: Sent update 8 with 
> 1 changes for pingd, id=<n/a>, set=(null)
> Jun 11 13:01:06 ha3 crmd[5014]: notice: do_state_transition: State transition 
> S_IDLE -> S_POLICY_ENGINE [ input=I_PE_CALC cause=C_FSA_INTERNAL 
> origin=abort_transition_graph ]
> Jun 11 13:01:06 ha3 pengine[5013]: notice: unpack_config: On loss of CCM 
> Quorum: Ignore
> Jun 11 13:01:06 ha3 pengine[5013]: warning: unpack_rsc_op_failure: Processing 
> failed op start for ha3_fabric_ping on ha3: unknown error (1)
> Jun 11 13:01:06 ha3 pengine[5013]: notice: process_pe_message: Calculated 
> Transition 4: /var/lib/pacemaker/pengine/pe-input-306.bz2
> Jun 11 13:01:06 ha3 crmd[5014]: notice: run_graph: Transition 4 (Complete=0, 
> Pending=0, Fired=0, Skipped=0, Incomplete=0, 
> Source=/var/lib/pacemaker/pengine/pe-input-306.bz2): Complete
> Jun 11 13:01:06 ha3 crmd[5014]: notice: do_state_transition: State transition 
> S_TRANSITION_ENGINE -> S_IDLE [ input=I_TE_SUCCESS cause=C_FSA_INTERNAL 
> origin=notify_crmd ]
> Jun 11 13:01:06 ha3 attrd[5012]: notice: attrd_cib_callback: Update 8 for 
> pingd[ha3]=(null): OK (0)
> 
> /etc/corosync/corosync.conf
> # Please read the corosync.conf.5 manual page
> totem {
> version: 2
> 
> crypto_cipher: none
> crypto_hash: none
> 
> interface {
> ringnumber: 0
> bindnetaddr: 10.10.0.0
> mcastport: 5405
> ttl: 1
> }
> transport: udpu
> }
> 
> logging {
> fileline: off
> to_logfile: no
> to_syslog: yes
> #logfile: /var/log/cluster/corosync.log
> debug: off
> timestamp: on
> logger_subsys {
> subsys: QUORUM
> debug: off
> }
> }
> 
> nodelist {
> node {
> ring0_addr: 10.10.0.14
> }
> 
> node {
> ring0_addr: 10.10.0.15
> }
> }
> 
> quorum {
> # Enable and configure quorum subsystem (default: off)
> # see also corosync.conf.5 and votequorum.5
> provider: corosync_votequorum
> expected_votes: 2
> }
> [root@ha3 ~]# 
> 
> Paul Cain
> 
> _______________________________________________
> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> 
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org

signature.asc
Description: Message signed with OpenPGP using GPGMail

_______________________________________________
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Re: [Pacemaker] When stonith is enabled, resources won't start until after stonith, even though requires="nothing" and prereq="nothing" on RHEL 7 with pacemaker-1.1.11 compiled from source.

Reply via email to