Hi,
I'm rather new to opanais and have run into some issues with the order of
fencing plus refusal to failover once one fencing method fails. Any help would
be much appreciated.
Even though I've set priority lower on my fence_node2_ipmi device it will
not fence first. But fence_node2_apc is picked (also tried setting the priority
the other way, no affect). Only when I delete fence_node2_ipmi and add it again
does it get used first. The second issue i'm running into is that if
fence_node2_ipmi fails OR fence_node2_apc for that matter it just keeps
reattempting that same fencing device over and over again.
Also every time it executes the reboot the physical ipmi card is issuing a
restart and the server endlessly rebooting.
Log output:
Apr 06 19:05:39 node1 stonith-ng: [2591]: info: log_data_element:
process_remote_stonith_exec: ExecResult <st-reply
st_origin="stonith_construct_async_reply" t="stonith-ng" st_op="st_notify"
st_remote_op="2311741c-fc3b-4094-badd-0ac9e10a209b" st_callid="0"
st_callopt="0" st_rc="1" st_output="Rebooting machine @
IPMI:192.168.1.161...Failed
" src="node1" seq="268" />
Apr 06 19:05:49 node1 stonith-ng: [2591]: ERROR: remote_op_timeout: Action
reboot (2311741c-fc3b-4094-badd-0ac9e10a209b) for node2 timed out
Apr 06 19:05:49 node1 stonith-ng: [2591]: info: remote_op_done: Notifing
clients of 2311741c-fc3b-4094-badd-0ac9e10a209b (reboot of node2 from
fc25a065-3355-455d-937f-360b07f9dda9 by (null)): 1, rc=-7
Apr 06 19:05:49 node1 stonith-ng: [2591]: info: stonith_notify_client: Sending
st_fence-notification to client 2596/17cafaec-7078-4972-937e-1cf5636c8523
Apr 06 19:05:50 node1 stonith-ng: [2591]: info: initiate_remote_stonith_op:
Initiating remote operation reboot for node2:
e5e1a936-c038-42bc-acff-18c2a41e9ae2
Apr 06 19:05:50 node1 stonith-ng: [2591]: info: log_data_element:
stonith_query: Query <stonith_command t="stonith-ng"
st_async_id="e5e1a936-c038-42bc-acff-18c2a41e9ae2" st_op="st_query"
st_callid="0" st_callopt="0"
st_remote_op="e5e1a936-c038-42bc-acff-18c2a41e9ae2" st_target="node2"
st_device_action="reboot" st_clientid="fc25a065-3355-455d-937f-360b07f9dda9"
src="node1" seq="269" />
Apr 06 19:05:50 node1 stonith-ng: [2591]: info: can_fence_host_with_device:
fence_node2_ipmi can fence node2: static-list
Apr 06 19:05:50 node1 stonith-ng: [2591]: info: can_fence_host_with_device:
fence_node2_apc can fence node2: static-list
Apr 06 19:05:50 node1 stonith-ng: [2591]: info: stonith_query: Found 2 matching
devices for 'node2'
Apr 06 19:05:50 node1 stonith-ng: [2591]: info: call_remote_stonith: Requesting
that node1 perform op reboot node2
Apr 06 19:05:50 node1 stonith-ng: [2591]: info: log_data_element:
stonith_fence: Exec <stonith_command t="stonith-ng"
st_async_id="e5e1a936-c038-42bc-acff-18c2a41e9ae2" st_op="st_fence"
st_callid="0" st_callopt="0"
st_remote_op="e5e1a936-c038-42bc-acff-18c2a41e9ae2" st_target="node2"
st_device_action="reboot" src="node1" seq="271" />
Apr 06 19:05:50 node1 stonith-ng: [2591]: info: can_fence_host_with_device:
fence_node2_ipmi can fence node2: static-list
Apr 06 19:05:50 node1 stonith-ng: [2591]: info: can_fence_host_with_device:
fence_node2_apc can fence node2: static-list
Apr 06 19:05:50 node1 stonith-ng: [2591]: info: stonith_fence: Found 2 matching
devices for 'node2'
I'm running version: 1.1.2.
Here is the relevant part of my cluster config:
node node1 \
attributes standby="off"
node node2 \
attributes standby="off"
primitive fence_node1 stonith:fence_ipmilan \
params action="reboot" ipaddr="192.168.1.160" login="ADMIN"
passwd="ADMIN" pcmk_host_check="static-list" pcmk_host_list="node1"
primitive fence_node1_apc stonith:fence_apc_snmp \
params ipaddr="192.168.1.180" action="reboot" port="node1"
community="private" pcmk_host_check="static-list" pcmk_host_list="node1"
priority="20"
primitive fence_node2_apc stonith:fence_apc_snmp \
params ipaddr="192.168.1.180" action="reboot" port="node2"
community="private" pcmk_host_check="static-list" pcmk_host_list="node2"
priority="100"
primitive fence_node2_ipmi stonith:fence_ipmilan \
params action="reboot" ipaddr="192.168.1.161" login="ADMIN"
passwd="ADMIN" pcmk_host_check="static-list" pcmk_host_list="node2"
priority="10"
location fence-node1_apc-on-node2 fence_node1_apc -inf: node1
location fence_node1-on-node2 fence_node1 -inf: node1
location fence_node2-on-node1 fence_node2_apc -inf: node2
location fence_node2_ipmi-on-node1 fence_node2_ipmi -inf: node2
property $id="cib-bootstrap-options" \
dc-version="1.1.2-f059ec7ced7a86f18e5490b67ebf4a0b963bccfe" \
cluster-infrastructure="openais" \
expected-quorum-votes="2" \
stonith-enabled="true" \
no-quorum-policy="ignore" \
stonith-timeout="30s"
rsc_defaults $id="rsc-options" \
resource-stickiness="100"
Best Regards,
Richard Cernava
_______________________________________________
Openais mailing list
[email protected]
https://lists.linux-foundation.org/mailman/listinfo/openais