As far as I know fence_xvm supports multiple hosts, but you need to open the port on both Hypervisour (udp) and Guest (tcp). 'fence_xvm -o list' should provide a list of VMs from all hosts that responded (and have the key). Usually, the biggest problem is the multicast traffic - as in many environments it can be dropped by firewalls.
Best Regards, Strahil Nikolov На 6 юли 2020 г. 12:24:08 GMT+03:00, Klaus Wenninger <kwenn...@redhat.com> написа: >On 7/6/20 10:10 AM, stefan.schm...@farmpartner-tec.com wrote: >> Hello, >> >> >> # fence_xvm -o list >> >> kvm102 >bab3749c-15fc-40b7-8b6c-d4267b9f0eb9 >> >> on >> >> >This should show both VMs, so getting to that point will likely >solve >> >your problem. fence_xvm relies on multicast, there could be some >> >obscure network configuration to get that working on the VMs. >You said you tried on both hosts. What does 'virsh list' >give you onthe 2 hosts? Hopefully different names for >the VMs ... >Did you try 'fence_xvm -a {mcast-ip} -o list' on the >guests as well? >Did you try pinging via the physical network that is >connected tothe bridge configured to be used for >fencing? >If I got it right fence_xvm should supportcollecting >answersfrom multiple hosts but I found a suggestion >to do a setup with 2 multicast-addresses & keys for >each host. >Which route did you go? > >Klaus >> >> Thank you for pointing me in that direction. We have tried to solve >> that but with no success. We were using an howto provided here >> https://wiki.clusterlabs.org/wiki/Guest_Fencing >> >> Problem is, it specifically states that the tutorial does not yet >> support the case where guests are running on multiple hosts. There >are >> some short hints what might be necessary to do, but working through >> those sadly just did not work nor where there any clues which would >> help us finding a solution ourselves. So now we are completely stuck >> here. >> >> Has someone the same configuration with Guest VMs on multiple hosts? >> And how did you manage to get that to work? What do we need to do to >> resolve this? Is there maybe even someone who would be willing to >take >> a closer look at our server? Any help would be greatly appreciated! >> >> Kind regards >> Stefan Schmitz >> >> >> >> Am 03.07.2020 um 02:39 schrieb Ken Gaillot: >>> On Thu, 2020-07-02 at 17:18 +0200, >stefan.schm...@farmpartner-tec.com >>> wrote: >>>> Hello, >>>> >>>> I hope someone can help with this problem. We are (still) trying to >>>> get >>>> Stonith to achieve a running active/active HA Cluster, but sadly to >>>> no >>>> avail. >>>> >>>> There are 2 Centos Hosts. On each one there is a virtual Ubuntu VM. >>>> The >>>> Ubuntu VMs are the ones which should form the HA Cluster. >>>> >>>> The current status is this: >>>> >>>> # pcs status >>>> Cluster name: pacemaker_cluster >>>> WARNING: corosync and pacemaker node names do not match (IPs used >in >>>> setup?) >>>> Stack: corosync >>>> Current DC: server2ubuntu1 (version 1.1.18-2b07d5c5a9) - partition >>>> with >>>> quorum >>>> Last updated: Thu Jul 2 17:03:53 2020 >>>> Last change: Thu Jul 2 14:33:14 2020 by root via cibadmin on >>>> server4ubuntu1 >>>> >>>> 2 nodes configured >>>> 13 resources configured >>>> >>>> Online: [ server2ubuntu1 server4ubuntu1 ] >>>> >>>> Full list of resources: >>>> >>>> stonith_id_1 (stonith:external/libvirt): Stopped >>>> Master/Slave Set: r0_pacemaker_Clone [r0_pacemaker] >>>> Masters: [ server4ubuntu1 ] >>>> Slaves: [ server2ubuntu1 ] >>>> Master/Slave Set: WebDataClone [WebData] >>>> Masters: [ server2ubuntu1 server4ubuntu1 ] >>>> Clone Set: dlm-clone [dlm] >>>> Started: [ server2ubuntu1 server4ubuntu1 ] >>>> Clone Set: ClusterIP-clone [ClusterIP] (unique) >>>> ClusterIP:0 (ocf::heartbeat:IPaddr2): Started >>>> server2ubuntu1 >>>> ClusterIP:1 (ocf::heartbeat:IPaddr2): Started >>>> server4ubuntu1 >>>> Clone Set: WebFS-clone [WebFS] >>>> Started: [ server4ubuntu1 ] >>>> Stopped: [ server2ubuntu1 ] >>>> Clone Set: WebSite-clone [WebSite] >>>> Started: [ server4ubuntu1 ] >>>> Stopped: [ server2ubuntu1 ] >>>> >>>> Failed Actions: >>>> * stonith_id_1_start_0 on server2ubuntu1 'unknown error' (1): >>>> call=201, >>>> status=Error, exitreason='', >>>> last-rc-change='Thu Jul 2 14:37:35 2020', queued=0ms, >>>> exec=3403ms >>>> * r0_pacemaker_monitor_60000 on server2ubuntu1 'master' (8): >>>> call=203, >>>> status=complete, exitreason='', >>>> last-rc-change='Thu Jul 2 14:38:39 2020', queued=0ms, >exec=0ms >>>> * stonith_id_1_start_0 on server4ubuntu1 'unknown error' (1): >>>> call=202, >>>> status=Error, exitreason='', >>>> last-rc-change='Thu Jul 2 14:37:39 2020', queued=0ms, >>>> exec=3411ms >>>> >>>> >>>> The stonith resoursce is stopped and does not seem to work. >>>> On both hosts the command >>>> # fence_xvm -o list >>>> kvm102 >bab3749c-15fc-40b7-8b6c-d4267b9f0eb9 >>>> on >>> >>> This should show both VMs, so getting to that point will likely >solve >>> your problem. fence_xvm relies on multicast, there could be some >>> obscure network configuration to get that working on the VMs. >>> >>>> returns the local VM. Apparently it connects through the >>>> Virtualization >>>> interface because it returns the VM name not the Hostname of the >>>> client >>>> VM. I do not know if this is how it is supposed to work? >>> >>> Yes, fence_xvm knows only about the VM names. >>> >>> To get pacemaker to be able to use it for fencing the cluster nodes, >>> you have to add a pcmk_host_map parameter to the fencing resource. >It >>> looks like pcmk_host_map="nodename1:vmname1;nodename2:vmname2;..." >>> >>>> In the local network, every traffic is allowed. No firewall is >>>> locally >>>> active, just the connections leaving the local network are >>>> firewalled. >>>> Hence there are no coneection problems between the hosts and >clients. >>>> For example we can succesfully connect from the clients to the >Hosts: >>>> >>>> # nc -z -v -u 192.168.1.21 1229 >>>> Ncat: Version 7.50 ( https://nmap.org/ncat ) >>>> Ncat: Connected to 192.168.1.21:1229. >>>> Ncat: UDP packet sent successfully >>>> Ncat: 1 bytes sent, 0 bytes received in 2.03 seconds. >>>> >>>> # nc -z -v -u 192.168.1.13 1229 >>>> Ncat: Version 7.50 ( https://nmap.org/ncat ) >>>> Ncat: Connected to 192.168.1.13:1229. >>>> Ncat: UDP packet sent successfully >>>> Ncat: 1 bytes sent, 0 bytes received in 2.03 seconds. >>>> >>>> >>>> On the Ubuntu VMs we created and configured the the stonith >resource >>>> according to the howto provided here: >>>> >https://clusterlabs.org/pacemaker/doc/en-US/Pacemaker/1.1/pdf/Clusters_from_Scratch/Pacemaker-1.1-Clusters_from_Scratch-en-US.pdf >>>> >>>> >>>> The actual line we used: >>>> # pcs -f stonith_cfg stonith create stonith_id_1 external/libvirt >>>> hostlist="Host4,host2" >>>> hypervisor_uri="qemu+ssh://192.168.1.21/system" >>>> >>>> >>>> But as you can see in in the pcs status output, stonith is stopped >>>> and >>>> exits with an unkown error. >>>> >>>> Can somebody please advise on how to procced or what additionla >>>> information is needed to solve this problem? >>>> Any help would be greatly appreciated! Thank you in advance. >>>> >>>> Kind regards >>>> Stefan Schmitz >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >> _______________________________________________ >> Manage your subscription: >> https://lists.clusterlabs.org/mailman/listinfo/users >> >> ClusterLabs home: https://www.clusterlabs.org/ >> > >_______________________________________________ >Manage your subscription: >https://lists.clusterlabs.org/mailman/listinfo/users > >ClusterLabs home: https://www.clusterlabs.org/ _______________________________________________ Manage your subscription: https://lists.clusterlabs.org/mailman/listinfo/users ClusterLabs home: https://www.clusterlabs.org/