Re: [Pacemaker] Pacemaker fails to switch on or off PDU sockets with fence_wti

Andrew Beekhof Sun, 23 Jun 2013 17:47:46 -0700

On 21/06/2013, at 5:38 PM, Thibaut Pouzet <[email protected]> 
wrote:


> Le 20/06/2013 12:23, Andrew Beekhof a écrit :
>> On 20/06/2013, at 6:51 PM, Thibaut Pouzet <[email protected]> 
>> wrote:
>> 
>>> Le 19/06/2013 23:57, Andrew Beekhof a écrit :
>>>> On 20/06/2013, at 1:57 AM, Thibaut Pouzet 
>>>> <[email protected]> wrote:
>>>> 
>>>>> Hi,
>>>>> 
>>>>> I am trying to configure fencing on a test platform with two nodes under 
>>>>> corosync+cman+pacemaker on CentOS 6.4. Both nodes have a double power 
>>>>> supply from a WTI NPS-8HD16-3. IPMI fencing works like a charm, however I 
>>>>> cannot get the WTI fencing to work.
>>>>> 
>>>>> The problem is that the parameter  action="" seems to be ignored by 
>>>>> pacemaker.
>>>>> * This is the primitive :
>>>>> primitive wti_fence02_port2_off stonith:fence_wti \
>>>>>        params ipaddr="" action="off" pcmk_host_check="none" port="A2" 
>>>>> pcmk_host_check="static-list" pcmk_host_list="fence02.lyra-network.com" 
>>>>> login="" passwd="" shell_timeout="20" login_timeout="20"
>>>>> 
>>>>> * These are the corresponding log lines :
>>>>> Jun 19 16:56:45 fence01 stonith-ng[19266]:   notice: log_operation: 
>>>>> Operation 'reboot' [19953] (call 0 from crmd.19268) for host 
>>>>> 'fence02.lyra-network.com' with device 'wti_fence02_port2_off' returned: 
>>>>> 0 (OK)
>>>>> Jun 19 16:56:45 fence01 stonith-ng[19266]:   notice: 
>>>>> process_remote_stonith_exec: Call to wti_fence02_port2_off for 
>>>>> fence02.lyra-network.com on behalf of 
>>>>> [email protected]: passed (0)
>>>>> 
>>>>> * These are the version used :
>>>>> pacemaker-1.1.8-7.el6.x86_64
>>>>> corosync-1.4.1-15.el6.x86_64
>>>>> cman-3.0.12.1-49.el6.x86_64
>>>>> fence-agents-3.1.5-25.el6_4.2.x86_64
>>>>> 
>>>>> The same thing happens with "on" actions.
>>>>> 
>>>>> When I run fence_wti from command line, it works perfectly fine with ON 
>>>>> or OFF actions ! I feel there is a workaround with something like 
>>>>> pcmk_reboot_action="/ON", but I don't understand how to use this...
>>>>> 
>>>>> (FYI, I'm using fencing topology like this :
>>>>> fencing_topology \
>>>>>        fence01.lyra-network.com: 
>>>>> wti_fence01_port1_off,wti_fence01_port5_off,wti_fence01_port5_on,wti_fence01_port1_on
>>>>>  ipmi_fence01 \
>>>>>        fence02.lyra-network.com: 
>>>>> wti_fence02_port2_off,wti_fence02_port6_off,wti_fence02_port6_on,wti_fence02_port2_on
>>>>>  ipmi_fence02 )
>>>>> 
>>>>> What is wrong here ?
>>>> I believe you're trying to use the per-agent pcmk_reboot_action option 
>>>> (man stonithd)
>>>> But you might be better off with the global stonith-action option (man 
>>>> pengine)
>>>> 
>>> Hum, I think I've not been clear enough on the initial e-mail. The usage of 
>>> "pcmk_reboot_action" or "stonith-action" is not the root of my problem. The 
>>> initial problem is that when I configure action="off"
>> My point would be that action=off is not the correct way to configure what 
>> you're trying to do.
>> 
>>> with a stonith primitive,  when this primitive is called, the actual action 
>>> that is launched through fence_wti is "reboot".
>>> 
>>> Therefore, when a node needs to be fenced, instead of having on the PDU :
>>> Port 2 OFF -> Port 6 OFF -> Port 6 ON -> Port 2 ON
>>> I have :
>>> Port 2 Reboot -> Port 6 Reboot -> Port 6 Reboot -> Port 2 Reboot
>>> 
>>> All actions are successful, pacemaker changes the fenced node's status from 
>>> "UNCLEAN" to "OFFLINE", while the node has not been rebooted at all.
>>> 
>>> -- 
>>> Thibaut Pouzet
>>> 
>>> 
> Okay, I took a look at these options, and replaced action="" from my 
> primitives with stonith-action="off" as a global property. I removed the 
> useless primitives and changed the topology :
> 
> fencing_topology \
>        fence01.lyra-network.com: wti_fence01_port1_off,wti_fence01_port5_off 
> ipmi_fence01 \
>        fence02.lyra-network.com: wti_fence02_port2_off,wti_fence02_port6_off 
> ipmi_fence02
> 
> My faulty node is off now, it's been shut down through the WTI. Next step : 
> rebooting the nodes. I'm not sure we can achieve such thing with this method 
> though...

This should do the trick:
   stonith_admin --unfence fence01.lyra-network.com

A future version of the agent should actually support reboot with multiple 
ports though.
If you're impatient, you could try the latest upstream release.

> 
> I looked at the code of fence_wti, and how it was called from pacemaker, and 
> I believe there could be a minor patch to the fencing agent that would make 
> everything easier :
> * On WTI switches, you can configure named port groups, and reboot a port 
> group (i.e. several PSUs) the same way you reboot a single port.
> * These port groups can be monitored via the command '/SG' in opposition to 
> single ports, monitored with '/S'. The output is a bit different, but not so 
> different.
> * When you call fence_wti with a named port group, the script wants to get 
> the status of the port group before making any action. Since the port groups 
> statuses are not reachable from '/S' command, it fails. However, if fence_wti 
> could only try '/SG' when '/S' fails, then it would get the group's status, 
> and then be able to simply do '/OFF port_group_name' (or /ON, /BOOT ..) the 
> same way it used to do '/OFF single_port' .

That also seems a reasonable approach.


_______________________________________________
Pacemaker mailing list: [email protected]
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Re: [Pacemaker] Pacemaker fails to switch on or off PDU sockets with fence_wti

Reply via email to