On Thu, Apr 7, 2011 at 4:12 AM, Richard <[email protected]> wrote: > Hi, > I'm rather new to opanais and have run into some issues with the order of > fencing plus refusal to failover once one fencing method fails. Any help > would be much appreciated. > Even though I've set priority lower on my fence_node2_ipmi device it will > not fence first. But fence_node2_apc is picked (also tried setting the > priority the other way, no affect). Only when I delete fence_node2_ipmi and > add it again does it get used first.
stonith in 1.1.x doesn't strictly observe the priorities. its one of the things we need to fix soon. > The second issue i'm running into is > that if fence_node2_ipmi fails OR fence_node2_apc for that matter it just > keeps reattempting that same fencing device over and over again. Its shouldn't do that - can you file a bug and include a crm_report archive please? > Also every time it executes the reboot the physical ipmi card is issuing a > restart and the server endlessly rebooting. > Log output: > Apr 06 19:05:39 node1 stonith-ng: [2591]: info: log_data_element: > process_remote_stonith_exec: ExecResult <st-reply > st_origin="stonith_construct_async_reply" t="stonith-ng" st_op="st_notify" > st_remote_op="2311741c-fc3b-4094-badd-0ac9e10a209b" st_callid="0" > st_callopt="0" st_rc="1" st_output="Rebooting machine @ > IPMI:192.168.1.161...Failed > " src="node1" seq="268" /> > Apr 06 19:05:49 node1 stonith-ng: [2591]: ERROR: remote_op_timeout: Action > reboot (2311741c-fc3b-4094-badd-0ac9e10a209b) for node2 timed out > Apr 06 19:05:49 node1 stonith-ng: [2591]: info: remote_op_done: Notifing > clients of 2311741c-fc3b-4094-badd-0ac9e10a209b (reboot of node2 from > fc25a065-3355-455d-937f-360b07f9dda9 by (null)): 1, rc=-7 > Apr 06 19:05:49 node1 stonith-ng: [2591]: info: stonith_notify_client: > Sending st_fence-notification to client > 2596/17cafaec-7078-4972-937e-1cf5636c8523 > Apr 06 19:05:50 node1 stonith-ng: [2591]: info: initiate_remote_stonith_op: > Initiating remote operation reboot for node2: > e5e1a936-c038-42bc-acff-18c2a41e9ae2 > Apr 06 19:05:50 node1 stonith-ng: [2591]: info: log_data_element: > stonith_query: Query <stonith_command t="stonith-ng" > st_async_id="e5e1a936-c038-42bc-acff-18c2a41e9ae2" st_op="st_query" > st_callid="0" st_callopt="0" > st_remote_op="e5e1a936-c038-42bc-acff-18c2a41e9ae2" st_target="node2" > st_device_action="reboot" st_clientid="fc25a065-3355-455d-937f-360b07f9dda9" > src="node1" seq="269" /> > Apr 06 19:05:50 node1 stonith-ng: [2591]: info: can_fence_host_with_device: > fence_node2_ipmi can fence node2: static-list > Apr 06 19:05:50 node1 stonith-ng: [2591]: info: can_fence_host_with_device: > fence_node2_apc can fence node2: static-list > Apr 06 19:05:50 node1 stonith-ng: [2591]: info: stonith_query: Found 2 > matching devices for 'node2' > Apr 06 19:05:50 node1 stonith-ng: [2591]: info: call_remote_stonith: > Requesting that node1 perform op reboot node2 > Apr 06 19:05:50 node1 stonith-ng: [2591]: info: log_data_element: > stonith_fence: Exec <stonith_command t="stonith-ng" > st_async_id="e5e1a936-c038-42bc-acff-18c2a41e9ae2" st_op="st_fence" > st_callid="0" st_callopt="0" > st_remote_op="e5e1a936-c038-42bc-acff-18c2a41e9ae2" st_target="node2" > st_device_action="reboot" src="node1" seq="271" /> > Apr 06 19:05:50 node1 stonith-ng: [2591]: info: can_fence_host_with_device: > fence_node2_ipmi can fence node2: static-list > Apr 06 19:05:50 node1 stonith-ng: [2591]: info: can_fence_host_with_device: > fence_node2_apc can fence node2: static-list > Apr 06 19:05:50 node1 stonith-ng: [2591]: info: stonith_fence: Found 2 > matching devices for 'node2' > > I'm running version: 1.1.2. > > Here is the relevant part of my cluster config: > node node1 \ > attributes standby="off" > node node2 \ > attributes standby="off" > primitive fence_node1 stonith:fence_ipmilan \ > params action="reboot" ipaddr="192.168.1.160" login="ADMIN" > passwd="ADMIN" pcmk_host_check="static-list" pcmk_host_list="node1" > primitive fence_node1_apc stonith:fence_apc_snmp \ > params ipaddr="192.168.1.180" action="reboot" port="node1" > community="private" pcmk_host_check="static-list" pcmk_host_list="node1" > priority="20" > primitive fence_node2_apc stonith:fence_apc_snmp \ > params ipaddr="192.168.1.180" action="reboot" port="node2" > community="private" pcmk_host_check="static-list" pcmk_host_list="node2" > priority="100" > primitive fence_node2_ipmi stonith:fence_ipmilan \ > params action="reboot" ipaddr="192.168.1.161" login="ADMIN" > passwd="ADMIN" pcmk_host_check="static-list" pcmk_host_list="node2" > priority="10" > location fence-node1_apc-on-node2 fence_node1_apc -inf: node1 > location fence_node1-on-node2 fence_node1 -inf: node1 > location fence_node2-on-node1 fence_node2_apc -inf: node2 > location fence_node2_ipmi-on-node1 fence_node2_ipmi -inf: node2 > property $id="cib-bootstrap-options" \ > dc-version="1.1.2-f059ec7ced7a86f18e5490b67ebf4a0b963bccfe" \ > cluster-infrastructure="openais" \ > expected-quorum-votes="2" \ > stonith-enabled="true" \ > no-quorum-policy="ignore" \ > stonith-timeout="30s" > rsc_defaults $id="rsc-options" \ > resource-stickiness="100" > > > > Best Regards, > Richard Cernava > _______________________________________________ > Openais mailing list > [email protected] > https://lists.linux-foundation.org/mailman/listinfo/openais > _______________________________________________ Openais mailing list [email protected] https://lists.linux-foundation.org/mailman/listinfo/openais
