On 12 Jun 2014, at 4:55 am, Paul E Cain <pec...@us.ibm.com> wrote: > Hello, > > Overview > I'm experimenting with a small two-node Pacemaker cluster on two RHEL 7 VMs. > One of the things I need to do is ensure that my cluster can connect to a > certain IP address 10.10.0.1 because once I add the actual resources that > will need to be HA those resources will need access to 10.10.0.1 for the > cluster to functional normally. To do that, I have one ocf:pacemaker:ping > resource for each node to check that connectivity. If the ping fails, the > node should go into standby mode and get fenced if possible. Additionally, > when a node first comes up I want that connectivity check to happen before > the fencing agents come up or a STONITH happens because a node should not try > to take over cluster resources if it cannot connect to 10.10.0.1. To do this, > I tried adding requires="nothing" and prereq="nothing" to all the operations > for both pinging resources. I also have two meatware fencing agents to use > for testing. I'm using order constraints so they don't start until after the > ping resources. > > Cluster When Functioning Normally > [root@ha3 ~]# crm_mon -1 > Last updated: Wed Jun 11 13:10:54 2014 > Last change: Wed Jun 11 13:10:35 2014 via crmd on ha3 > Stack: corosync > Current DC: ha3 (168427534) - partition with quorum > Version: 1.1.10-9d39a6b > 2 Nodes configured > 4 Resources configured > > > Online: [ ha3 ha4 ] > > ha3_fabric_ping (ocf::pacemaker:ping): Started ha3 > ha4_fabric_ping (ocf::pacemaker:ping): Started ha4 > fencing_route_to_ha3 (stonith:meatware): Started ha4 > fencing_route_to_ha4 (stonith:meatware): Started ha3 > > > Testing > However, when I tested this by only starting up pacemaker on ha3 and also > preventing ha3 from connecting to 10.10.0.1, I found that ha3 would not start > until after ha4 was STONITHed. What I was aiming for was for ha3_fabric_ping > to fail to start, which would prevent the fencing agent from starting and > therefore prevent any STONITH. > > > Question > Any ideas why this is not working as expected? It's my understanding that > requires="nothing" should allow ha3_fabric_ping to start even before any > fencing operations. Maybe I'm misunderstanding something.
Its because the entire node is in standby mode. Running crm_simulate with the cib.xml below shows: Node ha3 (168427534): standby (on-fail) In the config I see: <op name="monitor" interval="15s" requires="nothing" on-fail="standby" timeout="15s" id="ha3_fabric_ping-monitor-15s"> and: <lrm_rsc_op id="ha3_fabric_ping_last_failure_0" operation_key="ha3_fabric_ping_start_0" operation="start" crm-debug-origin="do_update_resource" crm_feature_set="3.0.8" transition-key="4:1:0:0ebf14dc-cfcf-425a-a507-65ed0ee060aa" transition-magic="0:1;4:1:0:0ebf14dc-cfcf-425a-a507-65ed0ee060aa" call-id="18" rc-code="1" op-status="0" interval="0" last-run="1402509641" last-rc-change="1402509641" exec-time="20043" queue-time="0" op-digest="ddf4bee6852a62c7efcf52cf7471d629"/> Note: rc-code="1" The combination put the node into standby and prevented resources starting. > > Thanks for any help you can offer. > > Below is shows the software versions, cibadmin -Q, the /var/log/messages on > ha3 during my test, and my corosync.conf file. > > Tell me if you need any more information. > > Software Versions (All Compiled From Source From The Website of the > Respective Projects) > Cluster glue 1.0.11 > libqb 0.17.0 > Corosync 2.3.3 > Pacemaker 1.1.11 > Resources Agents 3.9.5 > crmsh 2.0 > > cibadmin -Q > <cib epoch="204" num_updates="18" admin_epoch="0" > validate-with="pacemaker-1.2" cib-last-written="Wed Jun 11 12:56:50 2014" > crm_feature_set="3.0.8" update-origin="ha3" update-client="crm_resource" > have-quorum="1" dc-uuid="168427534"> > <configuration> > <crm_config> > <cluster_property_set id="cib-bootstrap-options"> > <nvpair name="symmetric-cluster" value="true" > id="cib-bootstrap-options-symmetric-cluster"/> > <nvpair name="stonith-enabled" value="true" > id="cib-bootstrap-options-stonith-enabled"/> > <nvpair name="stonith-action" value="reboot" > id="cib-bootstrap-options-stonith-action"/> > <nvpair name="no-quorum-policy" value="ignore" > id="cib-bootstrap-options-no-quorum-policy"/> > <nvpair name="stop-orphan-resources" value="true" > id="cib-bootstrap-options-stop-orphan-resources"/> > <nvpair name="stop-orphan-actions" value="true" > id="cib-bootstrap-options-stop-orphan-actions"/> > <nvpair name="default-action-timeout" value="20s" > id="cib-bootstrap-options-default-action-timeout"/> > <nvpair id="cib-bootstrap-options-dc-version" name="dc-version" > value="1.1.10-9d39a6b"/> > <nvpair id="cib-bootstrap-options-cluster-infrastructure" > name="cluster-infrastructure" value="corosync"/> > </cluster_property_set> > </crm_config> > <nodes> > <node id="168427534" uname="ha3"/> > <node id="168427535" uname="ha4"/> > </nodes> > <resources> > <primitive id="ha3_fabric_ping" class="ocf" provider="pacemaker" > type="ping"> > <instance_attributes id="ha3_fabric_ping-instance_attributes"> > <nvpair name="host_list" value="10.10.0.1" > id="ha3_fabric_ping-instance_attributes-host_list"/> > <nvpair name="failure_score" value="1" > id="ha3_fabric_ping-instance_attributes-failure_score"/> > </instance_attributes> > <operations> > <op name="start" timeout="60s" requires="nothing" on-fail="standby" > interval="0" id="ha3_fabric_ping-start-0"> > <instance_attributes > id="ha3_fabric_ping-start-0-instance_attributes"> > <nvpair name="prereq" value="nothing" > id="ha3_fabric_ping-start-0-instance_attributes-prereq"/> > </instance_attributes> > </op> > <op name="monitor" interval="15s" requires="nothing" > on-fail="standby" timeout="15s" id="ha3_fabric_ping-monitor-15s"> > <instance_attributes > id="ha3_fabric_ping-monitor-15s-instance_attributes"> > <nvpair name="prereq" value="nothing" > id="ha3_fabric_ping-monitor-15s-instance_attributes-prereq"/> > </instance_attributes> > </op> > <op name="stop" on-fail="fence" requires="nothing" interval="0" > id="ha3_fabric_ping-stop-0"> > <instance_attributes > id="ha3_fabric_ping-stop-0-instance_attributes"> > <nvpair name="prereq" value="nothing" > id="ha3_fabric_ping-stop-0-instance_attributes-prereq"/> > </instance_attributes> > </op> > </operations> > <meta_attributes id="ha3_fabric_ping-meta_attributes"> > <nvpair id="ha3_fabric_ping-meta_attributes-requires" > name="requires" value="nothing"/> > </meta_attributes> > </primitive> > <primitive id="ha4_fabric_ping" class="ocf" provider="pacemaker" > type="ping"> > <instance_attributes id="ha4_fabric_ping-instance_attributes"> > <nvpair name="host_list" value="10.10.0.1" > id="ha4_fabric_ping-instance_attributes-host_list"/> > <nvpair name="failure_score" value="1" > id="ha4_fabric_ping-instance_attributes-failure_score"/> > </instance_attributes> > <operations> > <op name="start" timeout="60s" requires="nothing" on-fail="standby" > interval="0" id="ha4_fabric_ping-start-0"> > <instance_attributes > id="ha4_fabric_ping-start-0-instance_attributes"> > <nvpair name="prereq" value="nothing" > id="ha4_fabric_ping-start-0-instance_attributes-prereq"/> > </instance_attributes> > </op> > <op name="monitor" interval="15s" requires="nothing" > on-fail="standby" timeout="15s" id="ha4_fabric_ping-monitor-15s"> > <instance_attributes > id="ha4_fabric_ping-monitor-15s-instance_attributes"> > <nvpair name="prereq" value="nothing" > id="ha4_fabric_ping-monitor-15s-instance_attributes-prereq"/> > </instance_attributes> > </op> > <op name="stop" on-fail="fence" requires="nothing" interval="0" > id="ha4_fabric_ping-stop-0"> > <instance_attributes > id="ha4_fabric_ping-stop-0-instance_attributes"> > <nvpair name="prereq" value="nothing" > id="ha4_fabric_ping-stop-0-instance_attributes-prereq"/> > </instance_attributes> > </op> > </operations> > <meta_attributes id="ha4_fabric_ping-meta_attributes"> > <nvpair id="ha4_fabric_ping-meta_attributes-requires" > name="requires" value="nothing"/> > </meta_attributes> > </primitive> > <primitive id="fencing_route_to_ha3" class="stonith" type="meatware"> > <instance_attributes id="fencing_route_to_ha3-instance_attributes"> > <nvpair name="hostlist" value="ha3" > id="fencing_route_to_ha3-instance_attributes-hostlist"/> > </instance_attributes> > <operations> > <op name="start" requires="nothing" interval="0" > id="fencing_route_to_ha3-start-0"> > <instance_attributes > id="fencing_route_to_ha3-start-0-instance_attributes"> > <nvpair name="prereq" value="nothing" > id="fencing_route_to_ha3-start-0-instance_attributes-prereq"/> > </instance_attributes> > </op> > <op name="monitor" requires="nothing" interval="0" > id="fencing_route_to_ha3-monitor-0"> > <instance_attributes > id="fencing_route_to_ha3-monitor-0-instance_attributes"> > <nvpair name="prereq" value="nothing" > id="fencing_route_to_ha3-monitor-0-instance_attributes-prereq"/> > </instance_attributes> > </op> > </operations> > </primitive> > <primitive id="fencing_route_to_ha4" class="stonith" type="meatware"> > <instance_attributes id="fencing_route_to_ha4-instance_attributes"> > <nvpair name="hostlist" value="ha4" > id="fencing_route_to_ha4-instance_attributes-hostlist"/> > </instance_attributes> > <operations> > <op name="start" requires="nothing" interval="0" > id="fencing_route_to_ha4-start-0"> > <instance_attributes > id="fencing_route_to_ha4-start-0-instance_attributes"> > <nvpair name="prereq" value="nothing" > id="fencing_route_to_ha4-start-0-instance_attributes-prereq"/> > </instance_attributes> > </op> > <op name="monitor" requires="nothing" interval="0" > id="fencing_route_to_ha4-monitor-0"> > <instance_attributes > id="fencing_route_to_ha4-monitor-0-instance_attributes"> > <nvpair name="prereq" value="nothing" > id="fencing_route_to_ha4-monitor-0-instance_attributes-prereq"/> > </instance_attributes> > </op> > </operations> > </primitive> > </resources> > <constraints> > <rsc_location id="ha3_fabric_ping_location" rsc="ha3_fabric_ping" > score="INFINITY" node="ha3"/> > <rsc_location id="ha3_fabric_ping_not_location" rsc="ha3_fabric_ping" > score="-INFINITY" node="ha4"/> > <rsc_location id="ha4_fabric_ping_location" rsc="ha4_fabric_ping" > score="INFINITY" node="ha4"/> > <rsc_location id="ha4_fabric_ping_not_location" rsc="ha4_fabric_ping" > score="-INFINITY" node="ha3"/> > <rsc_location id="fencing_route_to_ha4_location" > rsc="fencing_route_to_ha4" score="INFINITY" node="ha3"/> > <rsc_location id="fencing_route_to_ha4_not_location" > rsc="fencing_route_to_ha4" score="-INFINITY" node="ha4"/> > <rsc_location id="fencing_route_to_ha3_location" > rsc="fencing_route_to_ha3" score="INFINITY" node="ha4"/> > <rsc_location id="fencing_route_to_ha3_not_location" > rsc="fencing_route_to_ha3" score="-INFINITY" node="ha3"/> > <rsc_order id="ha3_fabric_ping_before_fencing_route_to_ha4" > score="INFINITY" first="ha3_fabric_ping" first-action="start" > then="fencing_route_to_ha4" then-action="start"/> > <rsc_order id="ha4_fabric_ping_before_fencing_route_to_ha3" > score="INFINITY" first="ha4_fabric_ping" first-action="start" > then="fencing_route_to_ha3" then-action="start"/> > </constraints> > <rsc_defaults> > <meta_attributes id="rsc-options"> > <nvpair name="resource-stickiness" value="INFINITY" > id="rsc-options-resource-stickiness"/> > <nvpair name="migration-threshold" value="0" > id="rsc-options-migration-threshold"/> > <nvpair name="is-managed" value="true" id="rsc-options-is-managed"/> > </meta_attributes> > </rsc_defaults> > </configuration> > <status> > <node_state id="168427534" uname="ha3" in_ccm="true" crmd="online" > crm-debug-origin="do_update_resource" join="member" expected="member"> > <lrm id="168427534"> > <lrm_resources> > <lrm_resource id="ha3_fabric_ping" type="ping" class="ocf" > provider="pacemaker"> > <lrm_rsc_op id="ha3_fabric_ping_last_0" > operation_key="ha3_fabric_ping_stop_0" operation="stop" > crm-debug-origin="do_update_resource" crm_feature_set="3.0.8" > transition-key="4:3:0:0ebf14dc-cfcf-425a-a507-65ed0ee060aa" > transition-magic="0:0;4:3:0:0ebf14dc-cfcf-425a-a507-65ed0ee060aa" > call-id="19" rc-code="0" op-status="0" interval="0" last-run="1402509661" > last-rc-change="1402509661" exec-time="12" queue-time="0" > op-digest="91b00b3fe95f23582466d18e42c4fd58"/> > <lrm_rsc_op id="ha3_fabric_ping_last_failure_0" > operation_key="ha3_fabric_ping_start_0" operation="start" > crm-debug-origin="do_update_resource" crm_feature_set="3.0.8" > transition-key="4:1:0:0ebf14dc-cfcf-425a-a507-65ed0ee060aa" > transition-magic="0:1;4:1:0:0ebf14dc-cfcf-425a-a507-65ed0ee060aa" > call-id="18" rc-code="1" op-status="0" interval="0" last-run="1402509641" > last-rc-change="1402509641" exec-time="20043" queue-time="0" > op-digest="ddf4bee6852a62c7efcf52cf7471d629"/> > </lrm_resource> > <lrm_resource id="ha4_fabric_ping" type="ping" class="ocf" > provider="pacemaker"> > <lrm_rsc_op id="ha4_fabric_ping_last_0" > operation_key="ha4_fabric_ping_monitor_0" operation="monitor" > crm-debug-origin="do_update_resource" crm_feature_set="3.0.8" > transition-key="5:0:7:0ebf14dc-cfcf-425a-a507-65ed0ee060aa" > transition-magic="0:7;5:0:7:0ebf14dc-cfcf-425a-a507-65ed0ee060aa" call-id="9" > rc-code="7" op-status="0" interval="0" last-run="1402509565" > last-rc-change="1402509565" exec-time="10" queue-time="0" > op-digest="91b00b3fe95f23582466d18e42c4fd58"/> > </lrm_resource> > <lrm_resource id="fencing_route_to_ha3" type="meatware" > class="stonith"> > <lrm_rsc_op id="fencing_route_to_ha3_last_0" > operation_key="fencing_route_to_ha3_monitor_0" operation="monitor" > crm-debug-origin="do_update_resource" crm_feature_set="3.0.8" > transition-key="6:0:7:0ebf14dc-cfcf-425a-a507-65ed0ee060aa" > transition-magic="0:7;6:0:7:0ebf14dc-cfcf-425a-a507-65ed0ee060aa" > call-id="13" rc-code="7" op-status="0" interval="0" last-run="1402509565" > last-rc-change="1402509565" exec-time="1" queue-time="0" > op-digest="502fbd7a2366c2be772d7fbecc9e0351"/> > </lrm_resource> > <lrm_resource id="fencing_route_to_ha4" type="meatware" > class="stonith"> > <lrm_rsc_op id="fencing_route_to_ha4_last_0" > operation_key="fencing_route_to_ha4_monitor_0" operation="monitor" > crm-debug-origin="do_update_resource" crm_feature_set="3.0.8" > transition-key="7:0:7:0ebf14dc-cfcf-425a-a507-65ed0ee060aa" > transition-magic="0:7;7:0:7:0ebf14dc-cfcf-425a-a507-65ed0ee060aa" > call-id="17" rc-code="7" op-status="0" interval="0" last-run="1402509565" > last-rc-change="1402509565" exec-time="0" queue-time="0" > op-digest="5be26fbcfd648e3d545d0115645dde76"/> > </lrm_resource> > </lrm_resources> > </lrm> > <transient_attributes id="168427534"> > <instance_attributes id="status-168427534"> > <nvpair id="status-168427534-shutdown" name="shutdown" value="0"/> > <nvpair id="status-168427534-probe_complete" name="probe_complete" > value="true"/> > <nvpair id="status-168427534-fail-count-ha3_fabric_ping" > name="fail-count-ha3_fabric_ping" value="INFINITY"/> > <nvpair id="status-168427534-last-failure-ha3_fabric_ping" > name="last-failure-ha3_fabric_ping" value="1402509661"/> > </instance_attributes> > </transient_attributes> > </node_state> > <node_state id="168427535" in_ccm="false" crmd="offline" join="down" > crm-debug-origin="send_stonith_update" uname="ha4" expected="down"/> > </status> > </cib> > [root@ha3 ~]# > > > /var/log/messages from when pacemaker started on ha3 to when ha3_fabric_ping > failed. > Jun 11 12:59:01 ha3 systemd: Starting LSB: Starts and stops Pacemaker Cluster > Manager.... > Jun 11 12:59:01 ha3 pacemaker: Starting Pacemaker Cluster Manager > Jun 11 12:59:01 ha3 pacemakerd[5007]: notice: mcp_read_config: Configured > corosync to accept connections from group 1000: OK (1) > Jun 11 12:59:01 ha3 pacemakerd[5007]: notice: main: Starting Pacemaker 1.1.10 > (Build: 9d39a6b): agent-manpages ncurses libqb-logging libqb-ipc lha-fencing > nagios corosync-native libesmtp > Jun 11 12:59:01 ha3 pacemakerd[5007]: notice: corosync_node_name: Unable to > get node name for nodeid 168427534 > Jun 11 12:59:01 ha3 pacemakerd[5007]: notice: get_node_name: Could not obtain > a node name for corosync nodeid 168427534 > Jun 11 12:59:01 ha3 pacemakerd[5007]: notice: cluster_connect_quorum: Quorum > acquired > Jun 11 12:59:01 ha3 pacemakerd[5007]: notice: corosync_node_name: Unable to > get node name for nodeid 168427534 > Jun 11 12:59:01 ha3 pacemakerd[5007]: notice: get_node_name: Defaulting to > uname -n for the local corosync node name > Jun 11 12:59:01 ha3 pacemakerd[5007]: notice: crm_update_peer_state: > pcmk_quorum_notification: Node ha3[168427534] - state is now member (was > (null)) > Jun 11 12:59:01 ha3 pacemakerd[5007]: notice: corosync_node_name: Unable to > get node name for nodeid 168427535 > Jun 11 12:59:01 ha3 pacemakerd[5007]: notice: get_node_name: Could not obtain > a node name for corosync nodeid 168427535 > Jun 11 12:59:01 ha3 pacemakerd[5007]: notice: corosync_node_name: Unable to > get node name for nodeid 168427535 > Jun 11 12:59:01 ha3 pacemakerd[5007]: notice: corosync_node_name: Unable to > get node name for nodeid 168427535 > Jun 11 12:59:01 ha3 pacemakerd[5007]: notice: get_node_name: Could not obtain > a node name for corosync nodeid 168427535 > Jun 11 12:59:01 ha3 pacemakerd[5007]: notice: crm_update_peer_state: > pcmk_quorum_notification: Node (null)[168427535] - state is now member (was > (null)) > Jun 11 12:59:02 ha3 pengine[5013]: warning: crm_is_writable: > /var/lib/pacemaker/pengine should be owned and r/w by group haclient > Jun 11 12:59:02 ha3 cib[5009]: warning: crm_is_writable: > /var/lib/pacemaker/cib should be owned and r/w by group haclient > Jun 11 12:59:02 ha3 cib[5009]: notice: crm_cluster_connect: Connecting to > cluster infrastructure: corosync > Jun 11 12:59:02 ha3 stonith-ng[5010]: notice: crm_cluster_connect: Connecting > to cluster infrastructure: corosync > Jun 11 12:59:02 ha3 crmd[5014]: notice: main: CRM Git Version: 9d39a6b > Jun 11 12:59:02 ha3 crmd[5014]: warning: crm_is_writable: > /var/lib/pacemaker/pengine should be owned and r/w by group haclient > Jun 11 12:59:02 ha3 crmd[5014]: warning: crm_is_writable: > /var/lib/pacemaker/cib should be owned and r/w by group haclient > Jun 11 12:59:02 ha3 attrd[5012]: notice: crm_cluster_connect: Connecting to > cluster infrastructure: corosync > Jun 11 12:59:02 ha3 attrd[5012]: notice: corosync_node_name: Unable to get > node name for nodeid 168427534 > Jun 11 12:59:02 ha3 attrd[5012]: notice: get_node_name: Could not obtain a > node name for corosync nodeid 168427534 > Jun 11 12:59:02 ha3 attrd[5012]: notice: crm_update_peer_state: > attrd_peer_change_cb: Node (null)[168427534] - state is now member (was > (null)) > Jun 11 12:59:02 ha3 attrd[5012]: notice: corosync_node_name: Unable to get > node name for nodeid 168427534 > Jun 11 12:59:02 ha3 attrd[5012]: notice: get_node_name: Defaulting to uname > -n for the local corosync node name > Jun 11 12:59:02 ha3 stonith-ng[5010]: notice: corosync_node_name: Unable to > get node name for nodeid 168427534 > Jun 11 12:59:02 ha3 stonith-ng[5010]: notice: get_node_name: Could not obtain > a node name for corosync nodeid 168427534 > Jun 11 12:59:02 ha3 stonith-ng[5010]: notice: corosync_node_name: Unable to > get node name for nodeid 168427534 > Jun 11 12:59:02 ha3 stonith-ng[5010]: notice: get_node_name: Defaulting to > uname -n for the local corosync node name > Jun 11 12:59:02 ha3 cib[5009]: notice: corosync_node_name: Unable to get node > name for nodeid 168427534 > Jun 11 12:59:02 ha3 cib[5009]: notice: get_node_name: Could not obtain a node > name for corosync nodeid 168427534 > Jun 11 12:59:02 ha3 cib[5009]: notice: corosync_node_name: Unable to get node > name for nodeid 168427534 > Jun 11 12:59:02 ha3 cib[5009]: notice: get_node_name: Defaulting to uname -n > for the local corosync node name > Jun 11 12:59:03 ha3 crmd[5014]: notice: crm_cluster_connect: Connecting to > cluster infrastructure: corosync > Jun 11 12:59:03 ha3 crmd[5014]: notice: corosync_node_name: Unable to get > node name for nodeid 168427534 > Jun 11 12:59:03 ha3 crmd[5014]: notice: get_node_name: Could not obtain a > node name for corosync nodeid 168427534 > Jun 11 12:59:03 ha3 stonith-ng[5010]: notice: setup_cib: Watching for stonith > topology changes > Jun 11 12:59:03 ha3 stonith-ng[5010]: notice: unpack_config: On loss of CCM > Quorum: Ignore > Jun 11 12:59:03 ha3 crmd[5014]: notice: corosync_node_name: Unable to get > node name for nodeid 168427534 > Jun 11 12:59:03 ha3 crmd[5014]: notice: get_node_name: Defaulting to uname -n > for the local corosync node name > Jun 11 12:59:03 ha3 crmd[5014]: notice: cluster_connect_quorum: Quorum > acquired > Jun 11 12:59:03 ha3 crmd[5014]: notice: crm_update_peer_state: > pcmk_quorum_notification: Node ha3[168427534] - state is now member (was > (null)) > Jun 11 12:59:03 ha3 crmd[5014]: notice: corosync_node_name: Unable to get > node name for nodeid 168427535 > Jun 11 12:59:03 ha3 crmd[5014]: notice: get_node_name: Could not obtain a > node name for corosync nodeid 168427535 > Jun 11 12:59:03 ha3 crmd[5014]: notice: corosync_node_name: Unable to get > node name for nodeid 168427535 > Jun 11 12:59:03 ha3 crmd[5014]: notice: corosync_node_name: Unable to get > node name for nodeid 168427535 > Jun 11 12:59:03 ha3 crmd[5014]: notice: get_node_name: Could not obtain a > node name for corosync nodeid 168427535 > Jun 11 12:59:03 ha3 crmd[5014]: notice: crm_update_peer_state: > pcmk_quorum_notification: Node (null)[168427535] - state is now member (was > (null)) > Jun 11 12:59:03 ha3 crmd[5014]: notice: corosync_node_name: Unable to get > node name for nodeid 168427534 > Jun 11 12:59:03 ha3 crmd[5014]: notice: get_node_name: Defaulting to uname -n > for the local corosync node name > Jun 11 12:59:03 ha3 crmd[5014]: notice: do_started: The local CRM is > operational > Jun 11 12:59:03 ha3 crmd[5014]: notice: do_state_transition: State transition > S_STARTING -> S_PENDING [ input=I_PENDING cause=C_FSA_INTERNAL > origin=do_started ] > Jun 11 12:59:04 ha3 stonith-ng[5010]: notice: stonith_device_register: Added > 'fencing_route_to_ha4' to the device list (1 active devices) > Jun 11 12:59:06 ha3 pacemaker: Starting Pacemaker Cluster Manager[ OK ] > Jun 11 12:59:06 ha3 systemd: Started LSB: Starts and stops Pacemaker Cluster > Manager.. > Jun 11 12:59:24 ha3 crmd[5014]: warning: do_log: FSA: Input I_DC_TIMEOUT from > crm_timer_popped() received in state S_PENDING > Jun 11 12:59:24 ha3 crmd[5014]: notice: do_state_transition: State transition > S_ELECTION -> S_INTEGRATION [ input=I_ELECTION_DC cause=C_TIMER_POPPED > origin=election_timeout_popped ] > Jun 11 12:59:24 ha3 crmd[5014]: warning: do_log: FSA: Input I_ELECTION_DC > from do_election_check() received in state S_INTEGRATION > Jun 11 12:59:24 ha3 cib[5009]: notice: corosync_node_name: Unable to get node > name for nodeid 168427534 > Jun 11 12:59:24 ha3 cib[5009]: notice: get_node_name: Defaulting to uname -n > for the local corosync node name > Jun 11 12:59:24 ha3 attrd[5012]: notice: corosync_node_name: Unable to get > node name for nodeid 168427534 > Jun 11 12:59:24 ha3 attrd[5012]: notice: get_node_name: Defaulting to uname > -n for the local corosync node name > Jun 11 12:59:24 ha3 attrd[5012]: notice: write_attribute: Sent update 2 with > 1 changes for terminate, id=<n/a>, set=(null) > Jun 11 12:59:24 ha3 attrd[5012]: notice: write_attribute: Sent update 3 with > 1 changes for shutdown, id=<n/a>, set=(null) > Jun 11 12:59:24 ha3 attrd[5012]: notice: attrd_cib_callback: Update 2 for > terminate[ha3]=(null): OK (0) > Jun 11 12:59:24 ha3 attrd[5012]: notice: attrd_cib_callback: Update 3 for > shutdown[ha3]=0: OK (0) > Jun 11 12:59:25 ha3 pengine[5013]: notice: unpack_config: On loss of CCM > Quorum: Ignore > Jun 11 12:59:25 ha3 pengine[5013]: warning: stage6: Scheduling Node ha4 for > STONITH > Jun 11 12:59:25 ha3 pengine[5013]: notice: LogActions: Start > ha3_fabric_ping (ha3) > Jun 11 12:59:25 ha3 pengine[5013]: notice: LogActions: Start > fencing_route_to_ha4 (ha3) > Jun 11 12:59:25 ha3 pengine[5013]: warning: process_pe_message: Calc ulated > Transition 0: /var/lib/pacemaker/pengine/pe-warn-80.bz2 > Jun 11 12:59:25 ha3 crmd[5014]: notice: te_rsc_command: Initiating action 4: > monitor ha3_fabric_ping_monitor_0 on ha3 (local) > Jun 11 12:59:25 ha3 crmd[5014]: notice: te_fence_node: Executing reboot > fencing operation (12) on ha4 (timeout=60000) > Jun 11 12:59:25 ha3 stonith-ng[5010]: notice: handle_request: Client > crmd.5014.dbbbf194 wants to fence (reboot) 'ha4' with device '(any)' > Jun 11 12:59:25 ha3 stonith-ng[5010]: notice: initiate_remote_stonith_op: > Initiating remote operation reboot for ha4: > b3ab6141-9612-4024-82b2-350e74bbb33d (0) > Jun 11 12:59:25 ha3 stonith-ng[5010]: notice: corosync_node_name: Unable to > get node name for nodeid 168427534 > Jun 11 12:59:25 ha3 stonith-ng[5010]: notice: get_node_name: Defaulting to > uname -n for the local corosync node name > Jun 11 12:59:25 ha3 stonith: [5027]: info: parse config info info=ha4 > Jun 11 12:59:25 ha3 stonith-ng[5010]: notice: can_fence_host_with_device: > fencing_route_to_ha4 can fence ha4: dynamic-list > Jun 11 12:59:25 ha3 stonith: [5031]: info: parse config info info=ha4 > Jun 11 12:59:25 ha3 stonith: [5031]: CRIT: OPERATOR INTERVENTION REQUIRED to > reset ha4. > Jun 11 12:59:25 ha3 stonith: [5031]: CRIT: Run "meatclient -c ha4" AFTER > power-cycling the machine. > Jun 11 12:59:25 ha3 crmd[5014]: notice: process_lrm_event: LRM operation > ha3_fabric_ping_monitor_0 (call=5, rc=7, cib-update=25, confirmed=true) not > running > Jun 11 12:59:25 ha3 crmd[5014]: notice: te_rsc_command: Initiating action 5: > monitor ha4_fabric_ping_monitor_0 on ha3 (local) > Jun 11 12:59:25 ha3 crmd[5014]: notice: process_lrm_event: LRM operation > ha4_fabric_ping_monitor_0 (call=9, rc=7, cib-update=26, confirmed=true) not > running > Jun 11 12:59:25 ha3 crmd[5014]: notice: te_rsc_command: Initiating action 6: > monitor fencing_route_to_ha3_monitor_0 on ha3 (local) > Jun 11 12:59:25 ha3 crmd[5014]: notice: te_rsc_command: Initiating action 7: > monitor fencing_route_to_ha4_monitor_0 on ha3 (local) > Jun 11 12:59:25 ha3 crmd[5014]: notice: te_rsc_command: Initiating action 3: > probe_complete probe_complete on ha3 (local) - no waiting > Jun 11 12:59:25 ha3 attrd[5012]: notice: write_attribute: Sent update 4 with > 1 changes for probe_complete, id=<n/a>, set=(null) > Jun 11 12:59:25 ha3 attrd[5012]: notice: attrd_cib_callback: Update 4 for > probe_complete[ha3]=true: OK (0) > Jun 11 13:00:25 ha3 stonith-ng[5010]: notice: stonith_action_async_done: > Child process 5030 performing action 'reboot' timed out with signal 15 > Jun 11 13:00:25 ha3 stonith-ng[5010]: error: log_operation: Operation > 'reboot' [5030] (call 2 from crmd.5014) for host 'ha4' with device > 'fencing_route_to_ha4' returned: -62 (Timer expired) > Jun 11 13:00:25 ha3 stonith-ng[5010]: warning: log_operation: > fencing_route_to_ha4:5030 [ Performing: stonith -t meatware -T reset ha4 ] > Jun 11 13:00:25 ha3 stonith-ng[5010]: notice: stonith_choose_peer: Couldn't > find anyone to fence ha4 with <any> > Jun 11 13:00:25 ha3 stonith-ng[5010]: error: remote_op_done: Operation reboot > of ha4 by ha3 for crmd.5014@ha3.b3ab6141: No route to host > Jun 11 13:00:25 ha3 crmd[5014]: notice: tengine_stonith_callback: Stonith > operation 2/12:0:0:0ebf14dc-cfcf-425a-a507-65ed0ee060aa: No route to host > (-113) > Jun 11 13:00:25 ha3 crmd[5014]: notice: tengine_stonith_callback: Stonith > operation 2 for ha4 failed (No route to host): aborting transition. > Jun 11 13:00:25 ha3 crmd[5014]: notice: tengine_stonith_notify: Peer ha4 was > not terminated (reboot) by ha3 for ha3: No route to host > (ref=b3ab6141-9612-4024-82b2-350e74bbb33d) by client crmd.5014 > Jun 11 13:00:25 ha3 crmd[5014]: notice: run_graph: Transition 0 (Complete=7, > Pending=0, Fired=0, Skipped=5, Incomplete=0, > Source=/var/lib/pacemaker/pengine/pe-warn-80.bz2): Stopped > Jun 11 13:00:25 ha3 pengine[5013]: notice: unpack_config: On loss of CCM > Quorum: Ignore > Jun 11 13:00:25 ha3 pengine[5013]: warning: stage6: Scheduling Node ha4 for > STONITH > Jun 11 13:00:25 ha3 pengine[5013]: notice: LogActions: Start > ha3_fabric_ping (ha3) > Jun 11 13:00:25 ha3 pengine[5013]: notice: LogActions: Start > fencing_route_to_ha4 (ha3) > Jun 11 13:00:25 ha3 pengine[5013]: warning: process_pe_message: Calculated > Transition 1: /var/lib/pacemaker/pengine/pe-warn-81.bz2 > Jun 11 13:00:25 ha3 crmd[5014]: notice: te_fence_node: Executing reboot > fencing operation (8) on ha4 (timeout=60000) > Jun 11 13:00:25 ha3 stonith-ng[5010]: notice: handle_request: Client > crmd.5014.dbbbf194 wants to fence (reboot) 'ha4' with device '(any)' > Jun 11 13:00:25 ha3 stonith-ng[5010]: notice: initiate_remote_stonith_op: > Initiating remote operation reboot for ha4: > eae78d4c-8d80-47fe-93e9-1a9261ec38a4 (0) > Jun 11 13:00:25 ha3 stonith-ng[5010]: notice: can_fence_host_with_device: > fencing_route_to_ha4 can fence ha4: dynamic-list > Jun 11 13:00:25 ha3 stonith-ng[5010]: notice: can_fence_host_with_device: > fencing_route_to_ha4 can fence ha4: dynamic-list > Jun 11 13:00:25 ha3 stonith: [5057]: info: parse config info info=ha4 > Jun 11 13:00:25 ha3 stonith: [5057]: CRIT: OPERATOR INTERVENTION REQUIRED to > reset ha4. > Jun 11 13:00:25 ha3 stonith: [5057]: CRIT: Run "meatclient -c ha4" AFTER > power-cycling the machine. > Jun 11 13:00:41 ha3 stonith: [5057]: info: node Meatware-reset: ha4 > Jun 11 13:00:41 ha3 stonith-ng[5010]: notice: log_operation: Operation > 'reboot' [5056] (call 3 from crmd.5014) for host 'ha4' with device > 'fencing_route_to_ha4' returned: 0 (OK) > Jun 11 13:00:41 ha3 stonith-ng[5010]: notice: remote_op_done: Operation > reboot of ha4 by ha3 for crmd.5014@ha3.eae78d4c: OK > Jun 11 13:00:41 ha3 crmd[5014]: notice: tengine_stonith_callback: Stonith > operation 3/8:1:0:0ebf14dc-cfcf-425a-a507-65ed0ee060aa: OK (0) > Jun 11 13:00:41 ha3 crmd[5014]: notice: crm_update_peer_state: > send_stonith_update: Node ha4[0] - state is now lost (was (null)) > Jun 11 13:00:41 ha3 crmd[5014]: notice: tengine_stonith_notify: Peer ha4 was > terminated (reboot) by ha3 for ha3: OK > (ref=eae78d4c-8d80-47fe-93e9-1a9261ec38a4) by client crmd.5014 > Jun 11 13:00:41 ha3 crmd[5014]: notice: te_rsc_command: Initiating action 4: > start ha3_fabric_ping_start_0 on ha3 (local) > Jun 11 13:01:01 ha3 systemd: Starting Session 22 of user root. > Jun 11 13:01:01 ha3 systemd: Started Session 22 of user root. > Jun 11 13:01:01 ha3 attrd[5012]: notice: write_attribute: Sent update 5 with > 1 changes for pingd, id=<n/a>, set=(null) > Jun 11 13:01:01 ha3 attrd[5012]: notice: attrd_cib_callback: Update 5 for > pingd[ha3]=0: OK (0) > Jun 11 13:01:01 ha3 ping(ha3_fabric_ping)[5060]: WARNING: pingd is less than > failure_score(1) > Jun 11 13:01:01 ha3 crmd[5014]: notice: process_lrm_event: LRM operation > ha3_fabric_ping_start_0 (call=18, rc=1, cib-update=37, confirmed=true) > unknown error > Jun 11 13:01:01 ha3 crmd[5014]: warning: status_from_rc: Action 4 > (ha3_fabric_ping_start_0) on ha3 failed (target: 0 vs. rc: 1): Error > Jun 11 13:01:01 ha3 crmd[5014]: warning: update_failcount: Updating failcount > for ha3_fabric_ping on ha3 after failed start: rc=1 (update=INFINITY, > time=1402509661) > Jun 11 13:01:01 ha3 crmd[5014]: warning: update_failcount: Updating failcount > for ha3_fabric_ping on ha3 after failed start: rc=1 (update=INFINITY, > time=1402509661) > Jun 11 13:01:01 ha3 crmd[5014]: notice: run_graph: Transition 1 (Complete=4, > Pending=0, Fired=0, Skipped=2, Incomplete=0, > Source=/var/lib/pacemaker/pengine/pe-warn-81.bz2): Stopped > Jun 11 13:01:01 ha3 attrd[5012]: notice: write_attribute: Sent update 6 with > 1 changes for fail-count-ha3_fabric_ping, id=<n/a>, set=(null) > Jun 11 13:01:01 ha3 attrd[5012]: notice: write_attribute: Sent update 7 with > 1 changes for last-failure-ha3_fabric_ping, id=<n/a>, set=(null) > Jun 11 13:01:01 ha3 pengine[5013]: notice: unpack_config: On loss of CCM > Quorum: Ignore > Jun 11 13:01:01 ha3 pengine[5013]: warning: unpack_rsc_op_failure: Processing > failed op start for ha3_fabric_ping on ha3: unknown error (1) > Jun 11 13:01:01 ha3 pengine[5013]: notice: LogActions: Stop > ha3_fabric_ping (ha3) > Jun 11 13:01:01 ha3 pengine[5013]: notice: process_pe_message: Calculated > Transition 2: /var/lib/pacemaker/pengine/pe-input-304.bz2 > Jun 11 13:01:01 ha3 attrd[5012]: notice: attrd_cib_callback: Update 6 for > fail-count-ha3_fabric_ping[ha3]=INFINITY: OK (0) > Jun 11 13:01:01 ha3 attrd[5012]: notice: attrd_cib_callback: Update 7 for > last-failure-ha3_fabric_ping[ha3]=1402509661: OK (0) > Jun 11 13:01:01 ha3 pengine[5013]: notice: unpack_config: On loss of CCM > Quorum: Ignore > Jun 11 13:01:01 ha3 pengine[5013]: warning: unpack_rsc_op_failure: Processing > failed op start for ha3_fabric_ping on ha3: unknown error (1) > Jun 11 13:01:01 ha3 pengine[5013]: notice: LogActions: Stop > ha3_fabric_ping (ha3) > Jun 11 13:01:01 ha3 pengine[5013]: notice: process_pe_message: Calculated > Transition 3: /var/lib/pacemaker/pengine/pe-input-305.bz2 > Jun 11 13:01:01 ha3 crmd[5014]: notice: te_rsc_command: Initiating action 4: > stop ha3_fabric_ping_stop_0 on ha3 (local) > Jun 11 13:01:01 ha3 crmd[5014]: notice: process_lrm_event: LRM operation > ha3_fabric_ping_stop_0 (call=19, rc=0, cib-update=41, confirmed=true) ok > Jun 11 13:01:01 ha3 crmd[5014]: notice: run_graph: Transition 3 (Complete=2, > Pending=0, Fired=0, Skipped=0, Incomplete=0, > Source=/var/lib/pacemaker/pengine/pe-input-305.bz2): Complete > Jun 11 13:01:01 ha3 crmd[5014]: notice: do_state_transition: State transition > S_TRANSITION_ENGINE -> S_IDLE [ input=I_TE_SUCCESS cause=C_FSA_INTERNAL > origin=notify_crmd ] > Jun 11 13:01:06 ha3 attrd[5012]: notice: write_attribute: Sent update 8 with > 1 changes for pingd, id=<n/a>, set=(null) > Jun 11 13:01:06 ha3 crmd[5014]: notice: do_state_transition: State transition > S_IDLE -> S_POLICY_ENGINE [ input=I_PE_CALC cause=C_FSA_INTERNAL > origin=abort_transition_graph ] > Jun 11 13:01:06 ha3 pengine[5013]: notice: unpack_config: On loss of CCM > Quorum: Ignore > Jun 11 13:01:06 ha3 pengine[5013]: warning: unpack_rsc_op_failure: Processing > failed op start for ha3_fabric_ping on ha3: unknown error (1) > Jun 11 13:01:06 ha3 pengine[5013]: notice: process_pe_message: Calculated > Transition 4: /var/lib/pacemaker/pengine/pe-input-306.bz2 > Jun 11 13:01:06 ha3 crmd[5014]: notice: run_graph: Transition 4 (Complete=0, > Pending=0, Fired=0, Skipped=0, Incomplete=0, > Source=/var/lib/pacemaker/pengine/pe-input-306.bz2): Complete > Jun 11 13:01:06 ha3 crmd[5014]: notice: do_state_transition: State transition > S_TRANSITION_ENGINE -> S_IDLE [ input=I_TE_SUCCESS cause=C_FSA_INTERNAL > origin=notify_crmd ] > Jun 11 13:01:06 ha3 attrd[5012]: notice: attrd_cib_callback: Update 8 for > pingd[ha3]=(null): OK (0) > > /etc/corosync/corosync.conf > # Please read the corosync.conf.5 manual page > totem { > version: 2 > > crypto_cipher: none > crypto_hash: none > > interface { > ringnumber: 0 > bindnetaddr: 10.10.0.0 > mcastport: 5405 > ttl: 1 > } > transport: udpu > } > > logging { > fileline: off > to_logfile: no > to_syslog: yes > #logfile: /var/log/cluster/corosync.log > debug: off > timestamp: on > logger_subsys { > subsys: QUORUM > debug: off > } > } > > nodelist { > node { > ring0_addr: 10.10.0.14 > } > > node { > ring0_addr: 10.10.0.15 > } > } > > quorum { > # Enable and configure quorum subsystem (default: off) > # see also corosync.conf.5 and votequorum.5 > provider: corosync_votequorum > expected_votes: 2 > } > [root@ha3 ~]# > > Paul Cain > > _______________________________________________ > Pacemaker mailing list: Pacemaker@oss.clusterlabs.org > http://oss.clusterlabs.org/mailman/listinfo/pacemaker > > Project Home: http://www.clusterlabs.org > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > Bugs: http://bugs.clusterlabs.org
signature.asc
Description: Message signed with OpenPGP using GPGMail
_______________________________________________ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org