Hi, 

I've tested: "Bug 1090511 [RFE] Improve fencing robustness by retrying failed 
attempts".
Spoiler alert: Tested feature worked, but fencing was not successful due to bug 
https://bugzilla.redhat.com/1124141

---

How to setup environment for testing:
- 3 hosts are required, at least two of them with PM enabled.
- 2 hosts (A, B), with pm enabled, should be with one cluster, remaining one 
(C) in another cluster. Reason for that is that search for fencing proxy is 
first done in same cluster, only if there's none host available, hosts outside 
of this cluster is considered; this separation is needed to make sure that 
right (not working) fencing proxy is selected first.

notation: 
host A ~ defective host to be fenced
host B ~ first selected fencing proxy, which will fail fencing host A.
host C ~ second selected fencing proxy, which should succeed fencing host A.
A and B are in same cluster.

process:
1. On host B we alter iptables, so it cannot contact host A and fence it. SSH 
was blocked to disallow soft fencing and ipmi was blocked to disallow 'hard' 
fencing.

iptables -A OUTPUT -p udp -d 10.34.63.198 --dport 623 -j DROP
iptables -A OUTPUT -p tcp -d 10.34.63.178 --dport 22 -j DROP

2. On host A was removed rules allowing connection to vdsm [1] and vdsm was 
restarted vdsm[2] so all ssh connections needs to be reopened. That makes 
engine think, that host is down/overloaded.
drop rule: 
ACCEPT     tcp  --  0.0.0.0/0            0.0.0.0/0            tcp dpt:54321

followed by
systemctl restart vdsmd


Result: After restart of vdsmd engine recognised host A as iresponsive, and 
tried to fence it. First attempt to fence host A was performed by host B and 
failed as expected, second attempt to fence host A performed by host C and from 
code perspective succeeded. Error message [1] correctly displayed. However 
fence was not successful due to bug https://bugzilla.redhat.com/1124141 which 
causes java.lang.StackOverflowError. Code related to this bug should be OK, but 
will be working only after mentioned bug is fixed.

M.

[1]. Fencing operation failed with proxy host <ID>, trying another proxy...
_______________________________________________
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users

Reply via email to