On 21/06/2013, at 5:38 PM, Thibaut Pouzet <[email protected]> wrote:
> Le 20/06/2013 12:23, Andrew Beekhof a écrit : >> On 20/06/2013, at 6:51 PM, Thibaut Pouzet <[email protected]> >> wrote: >> >>> Le 19/06/2013 23:57, Andrew Beekhof a écrit : >>>> On 20/06/2013, at 1:57 AM, Thibaut Pouzet >>>> <[email protected]> wrote: >>>> >>>>> Hi, >>>>> >>>>> I am trying to configure fencing on a test platform with two nodes under >>>>> corosync+cman+pacemaker on CentOS 6.4. Both nodes have a double power >>>>> supply from a WTI NPS-8HD16-3. IPMI fencing works like a charm, however I >>>>> cannot get the WTI fencing to work. >>>>> >>>>> The problem is that the parameter action="" seems to be ignored by >>>>> pacemaker. >>>>> * This is the primitive : >>>>> primitive wti_fence02_port2_off stonith:fence_wti \ >>>>> params ipaddr="" action="off" pcmk_host_check="none" port="A2" >>>>> pcmk_host_check="static-list" pcmk_host_list="fence02.lyra-network.com" >>>>> login="" passwd="" shell_timeout="20" login_timeout="20" >>>>> >>>>> * These are the corresponding log lines : >>>>> Jun 19 16:56:45 fence01 stonith-ng[19266]: notice: log_operation: >>>>> Operation 'reboot' [19953] (call 0 from crmd.19268) for host >>>>> 'fence02.lyra-network.com' with device 'wti_fence02_port2_off' returned: >>>>> 0 (OK) >>>>> Jun 19 16:56:45 fence01 stonith-ng[19266]: notice: >>>>> process_remote_stonith_exec: Call to wti_fence02_port2_off for >>>>> fence02.lyra-network.com on behalf of >>>>> [email protected]: passed (0) >>>>> >>>>> * These are the version used : >>>>> pacemaker-1.1.8-7.el6.x86_64 >>>>> corosync-1.4.1-15.el6.x86_64 >>>>> cman-3.0.12.1-49.el6.x86_64 >>>>> fence-agents-3.1.5-25.el6_4.2.x86_64 >>>>> >>>>> The same thing happens with "on" actions. >>>>> >>>>> When I run fence_wti from command line, it works perfectly fine with ON >>>>> or OFF actions ! I feel there is a workaround with something like >>>>> pcmk_reboot_action="/ON", but I don't understand how to use this... >>>>> >>>>> (FYI, I'm using fencing topology like this : >>>>> fencing_topology \ >>>>> fence01.lyra-network.com: >>>>> wti_fence01_port1_off,wti_fence01_port5_off,wti_fence01_port5_on,wti_fence01_port1_on >>>>> ipmi_fence01 \ >>>>> fence02.lyra-network.com: >>>>> wti_fence02_port2_off,wti_fence02_port6_off,wti_fence02_port6_on,wti_fence02_port2_on >>>>> ipmi_fence02 ) >>>>> >>>>> What is wrong here ? >>>> I believe you're trying to use the per-agent pcmk_reboot_action option >>>> (man stonithd) >>>> But you might be better off with the global stonith-action option (man >>>> pengine) >>>> >>> Hum, I think I've not been clear enough on the initial e-mail. The usage of >>> "pcmk_reboot_action" or "stonith-action" is not the root of my problem. The >>> initial problem is that when I configure action="off" >> My point would be that action=off is not the correct way to configure what >> you're trying to do. >> >>> with a stonith primitive, when this primitive is called, the actual action >>> that is launched through fence_wti is "reboot". >>> >>> Therefore, when a node needs to be fenced, instead of having on the PDU : >>> Port 2 OFF -> Port 6 OFF -> Port 6 ON -> Port 2 ON >>> I have : >>> Port 2 Reboot -> Port 6 Reboot -> Port 6 Reboot -> Port 2 Reboot >>> >>> All actions are successful, pacemaker changes the fenced node's status from >>> "UNCLEAN" to "OFFLINE", while the node has not been rebooted at all. >>> >>> -- >>> Thibaut Pouzet >>> >>> > Okay, I took a look at these options, and replaced action="" from my > primitives with stonith-action="off" as a global property. I removed the > useless primitives and changed the topology : > > fencing_topology \ > fence01.lyra-network.com: wti_fence01_port1_off,wti_fence01_port5_off > ipmi_fence01 \ > fence02.lyra-network.com: wti_fence02_port2_off,wti_fence02_port6_off > ipmi_fence02 > > My faulty node is off now, it's been shut down through the WTI. Next step : > rebooting the nodes. I'm not sure we can achieve such thing with this method > though... This should do the trick: stonith_admin --unfence fence01.lyra-network.com A future version of the agent should actually support reboot with multiple ports though. If you're impatient, you could try the latest upstream release. > > I looked at the code of fence_wti, and how it was called from pacemaker, and > I believe there could be a minor patch to the fencing agent that would make > everything easier : > * On WTI switches, you can configure named port groups, and reboot a port > group (i.e. several PSUs) the same way you reboot a single port. > * These port groups can be monitored via the command '/SG' in opposition to > single ports, monitored with '/S'. The output is a bit different, but not so > different. > * When you call fence_wti with a named port group, the script wants to get > the status of the port group before making any action. Since the port groups > statuses are not reachable from '/S' command, it fails. However, if fence_wti > could only try '/SG' when '/S' fails, then it would get the group's status, > and then be able to simply do '/OFF port_group_name' (or /ON, /BOOT ..) the > same way it used to do '/OFF single_port' . That also seems a reasonable approach. _______________________________________________ Pacemaker mailing list: [email protected] http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
