On 7/6/20 10:10 AM, stefan.schm...@farmpartner-tec.com wrote: > Hello, > > >> # fence_xvm -o list > >> kvm102 bab3749c-15fc-40b7-8b6c-d4267b9f0eb9 > >> on > > >This should show both VMs, so getting to that point will likely solve > >your problem. fence_xvm relies on multicast, there could be some > >obscure network configuration to get that working on the VMs. You said you tried on both hosts. What does 'virsh list' give you onthe 2 hosts? Hopefully different names for the VMs ... Did you try 'fence_xvm -a {mcast-ip} -o list' on the guests as well? Did you try pinging via the physical network that is connected tothe bridge configured to be used for fencing? If I got it right fence_xvm should supportcollecting answersfrom multiple hosts but I found a suggestion to do a setup with 2 multicast-addresses & keys for each host. Which route did you go?
Klaus > > Thank you for pointing me in that direction. We have tried to solve > that but with no success. We were using an howto provided here > https://wiki.clusterlabs.org/wiki/Guest_Fencing > > Problem is, it specifically states that the tutorial does not yet > support the case where guests are running on multiple hosts. There are > some short hints what might be necessary to do, but working through > those sadly just did not work nor where there any clues which would > help us finding a solution ourselves. So now we are completely stuck > here. > > Has someone the same configuration with Guest VMs on multiple hosts? > And how did you manage to get that to work? What do we need to do to > resolve this? Is there maybe even someone who would be willing to take > a closer look at our server? Any help would be greatly appreciated! > > Kind regards > Stefan Schmitz > > > > Am 03.07.2020 um 02:39 schrieb Ken Gaillot: >> On Thu, 2020-07-02 at 17:18 +0200, stefan.schm...@farmpartner-tec.com >> wrote: >>> Hello, >>> >>> I hope someone can help with this problem. We are (still) trying to >>> get >>> Stonith to achieve a running active/active HA Cluster, but sadly to >>> no >>> avail. >>> >>> There are 2 Centos Hosts. On each one there is a virtual Ubuntu VM. >>> The >>> Ubuntu VMs are the ones which should form the HA Cluster. >>> >>> The current status is this: >>> >>> # pcs status >>> Cluster name: pacemaker_cluster >>> WARNING: corosync and pacemaker node names do not match (IPs used in >>> setup?) >>> Stack: corosync >>> Current DC: server2ubuntu1 (version 1.1.18-2b07d5c5a9) - partition >>> with >>> quorum >>> Last updated: Thu Jul 2 17:03:53 2020 >>> Last change: Thu Jul 2 14:33:14 2020 by root via cibadmin on >>> server4ubuntu1 >>> >>> 2 nodes configured >>> 13 resources configured >>> >>> Online: [ server2ubuntu1 server4ubuntu1 ] >>> >>> Full list of resources: >>> >>> stonith_id_1 (stonith:external/libvirt): Stopped >>> Master/Slave Set: r0_pacemaker_Clone [r0_pacemaker] >>> Masters: [ server4ubuntu1 ] >>> Slaves: [ server2ubuntu1 ] >>> Master/Slave Set: WebDataClone [WebData] >>> Masters: [ server2ubuntu1 server4ubuntu1 ] >>> Clone Set: dlm-clone [dlm] >>> Started: [ server2ubuntu1 server4ubuntu1 ] >>> Clone Set: ClusterIP-clone [ClusterIP] (unique) >>> ClusterIP:0 (ocf::heartbeat:IPaddr2): Started >>> server2ubuntu1 >>> ClusterIP:1 (ocf::heartbeat:IPaddr2): Started >>> server4ubuntu1 >>> Clone Set: WebFS-clone [WebFS] >>> Started: [ server4ubuntu1 ] >>> Stopped: [ server2ubuntu1 ] >>> Clone Set: WebSite-clone [WebSite] >>> Started: [ server4ubuntu1 ] >>> Stopped: [ server2ubuntu1 ] >>> >>> Failed Actions: >>> * stonith_id_1_start_0 on server2ubuntu1 'unknown error' (1): >>> call=201, >>> status=Error, exitreason='', >>> last-rc-change='Thu Jul 2 14:37:35 2020', queued=0ms, >>> exec=3403ms >>> * r0_pacemaker_monitor_60000 on server2ubuntu1 'master' (8): >>> call=203, >>> status=complete, exitreason='', >>> last-rc-change='Thu Jul 2 14:38:39 2020', queued=0ms, exec=0ms >>> * stonith_id_1_start_0 on server4ubuntu1 'unknown error' (1): >>> call=202, >>> status=Error, exitreason='', >>> last-rc-change='Thu Jul 2 14:37:39 2020', queued=0ms, >>> exec=3411ms >>> >>> >>> The stonith resoursce is stopped and does not seem to work. >>> On both hosts the command >>> # fence_xvm -o list >>> kvm102 bab3749c-15fc-40b7-8b6c-d4267b9f0eb9 >>> on >> >> This should show both VMs, so getting to that point will likely solve >> your problem. fence_xvm relies on multicast, there could be some >> obscure network configuration to get that working on the VMs. >> >>> returns the local VM. Apparently it connects through the >>> Virtualization >>> interface because it returns the VM name not the Hostname of the >>> client >>> VM. I do not know if this is how it is supposed to work? >> >> Yes, fence_xvm knows only about the VM names. >> >> To get pacemaker to be able to use it for fencing the cluster nodes, >> you have to add a pcmk_host_map parameter to the fencing resource. It >> looks like pcmk_host_map="nodename1:vmname1;nodename2:vmname2;..." >> >>> In the local network, every traffic is allowed. No firewall is >>> locally >>> active, just the connections leaving the local network are >>> firewalled. >>> Hence there are no coneection problems between the hosts and clients. >>> For example we can succesfully connect from the clients to the Hosts: >>> >>> # nc -z -v -u 192.168.1.21 1229 >>> Ncat: Version 7.50 ( https://nmap.org/ncat ) >>> Ncat: Connected to 192.168.1.21:1229. >>> Ncat: UDP packet sent successfully >>> Ncat: 1 bytes sent, 0 bytes received in 2.03 seconds. >>> >>> # nc -z -v -u 192.168.1.13 1229 >>> Ncat: Version 7.50 ( https://nmap.org/ncat ) >>> Ncat: Connected to 192.168.1.13:1229. >>> Ncat: UDP packet sent successfully >>> Ncat: 1 bytes sent, 0 bytes received in 2.03 seconds. >>> >>> >>> On the Ubuntu VMs we created and configured the the stonith resource >>> according to the howto provided here: >>> https://clusterlabs.org/pacemaker/doc/en-US/Pacemaker/1.1/pdf/Clusters_from_Scratch/Pacemaker-1.1-Clusters_from_Scratch-en-US.pdf >>> >>> >>> The actual line we used: >>> # pcs -f stonith_cfg stonith create stonith_id_1 external/libvirt >>> hostlist="Host4,host2" >>> hypervisor_uri="qemu+ssh://192.168.1.21/system" >>> >>> >>> But as you can see in in the pcs status output, stonith is stopped >>> and >>> exits with an unkown error. >>> >>> Can somebody please advise on how to procced or what additionla >>> information is needed to solve this problem? >>> Any help would be greatly appreciated! Thank you in advance. >>> >>> Kind regards >>> Stefan Schmitz >>> >>> >>> >>> >>> >>> >>> >>> > _______________________________________________ > Manage your subscription: > https://lists.clusterlabs.org/mailman/listinfo/users > > ClusterLabs home: https://www.clusterlabs.org/ > _______________________________________________ Manage your subscription: https://lists.clusterlabs.org/mailman/listinfo/users ClusterLabs home: https://www.clusterlabs.org/