Have you run 'fence_virtd -c' ? I made a silly mistake last time when I deployed it and the daemon was not listening on the right interface. Netstat can check this out. Also, As far as I know hosts use unicast to reply to the VMs (thus tcp/1229 and not udp/1229).
If you have a developer account for Red Hat, you can check https://access.redhat.com/solutions/917833 Best Regards, Strahil Nikolov На 9 юли 2020 г. 17:01:13 GMT+03:00, "stefan.schm...@farmpartner-tec.com" <stefan.schm...@farmpartner-tec.com> написа: >Hello, > >thanks for the advise. I have worked through that list as follows: > > > - key deployed on the Hypervisours > > - key deployed on the VMs >I created the key file a while ago once on one host and distributed it >to every other host and guest. Right now it resides on all 4 machines >in >the same path: /etc/cluster/fence_xvm.key >Is there maybe a a corosync/Stonith or other function which checks the >keyfiles for any corruption or errors? > > > > - fence_virtd running on both Hypervisours >It is running on each host: ># ps aux |grep fence_virtd >root 62032 0.0 0.0 251568 4496 ? Ss Jun29 0:00 >fence_virtd > > >> - Firewall opened (1229/udp for the hosts, 1229/tcp for the >guests) > >Command on one host: >fence_xvm -a 225.0.0.12 -o list > >tcpdump on the guest residing on the other host: >host2.55179 > 225.0.0.12.1229: [udp sum ok] UDP, length 176 >host2 > igmp.mcast.net: igmp v3 report, 1 group record(s) [gaddr >225.0.0.12 to_in { }] >host2 > igmp.mcast.net: igmp v3 report, 1 group record(s) [gaddr >225.0.0.12 to_in { }] > >At least to me it looks like the VMs are reachable by the multicast >traffic. >Additionally, no matter on which host I execute the fence_xvm command, >tcpdum shows the same traffic on both guests. >But on the other hand, at the same time, tcpdump shows nothing on the >other host. Just to be sure I have flushed iptables beforehand on each >host. Is there maybe a problem? > > > > - fence_xvm on both VMs >fence_xvm is installed on both VMs ># which fence_xvm >/usr/sbin/fence_xvm > >Could you please advise on how to proceed? Thank you in advance. >Kind regards >Stefan Schmitz > >Am 08.07.2020 um 20:24 schrieb Strahil Nikolov: >> Erm...network/firewall is always "green". Run tcpdump on Host1 and >VM2 (not on the same host). >> Then run again 'fence_xvm -o list' and check what is captured. >> >> In summary, you need: >> - key deployed on the Hypervisours >> - key deployed on the VMs >> - fence_virtd running on both Hypervisours >> - Firewall opened (1229/udp for the hosts, 1229/tcp for the >guests) >> - fence_xvm on both VMs >> >> In your case , the primary suspect is multicast traffic. >> >> Best Regards, >> Strahil Nikolov >> >> На 8 юли 2020 г. 16:33:45 GMT+03:00, >"stefan.schm...@farmpartner-tec.com" ><stefan.schm...@farmpartner-tec.com> написа: >>> Hello, >>> >>>> I can't find fence_virtd for Ubuntu18, but it is available for >>>> Ubuntu20. >>> >>> We have now upgraded our Server to Ubuntu 20.04 LTS and installed >the >>> packages fence-virt and fence-virtd. >>> >>> The command "fence_xvm -a 225.0.0.12 -o list" on the Hosts still >just >>> returns the single local VM. >>> >>> The same command on both VMs results in: >>> # fence_xvm -a 225.0.0.12 -o list >>> Timed out waiting for response >>> Operation failed >>> >>> But just as before, trying to connect from the guest to the host via >nc >>> >>> just works fine. >>> #nc -z -v -u 192.168.1.21 1229 >>> Connection to 192.168.1.21 1229 port [udp/*] succeeded! >>> >>> So the hosts and service basically is reachable. >>> >>> I have spoken to our Firewall tech, he has assured me, that no local >>> traffic is hindered by anything. Be it multicast or not. >>> Software Firewalls are not present/active on any of our servers. >>> >>> Ubuntu guests: >>> # ufw status >>> Status: inactive >>> >>> CentOS hosts: >>> systemctl status firewalld >>> ● firewalld.service - firewalld - dynamic firewall daemon >>> Loaded: loaded (/usr/lib/systemd/system/firewalld.service; >disabled; >>> vendor preset: enabled) >>> Active: inactive (dead) >>> Docs: man:firewalld(1) >>> >>> >>> Any hints or help on how to remedy this problem would be greatly >>> appreciated! >>> >>> Kind regards >>> Stefan Schmitz >>> >>> >>> Am 07.07.2020 um 10:54 schrieb Klaus Wenninger: >>>> On 7/7/20 10:33 AM, Strahil Nikolov wrote: >>>>> I can't find fence_virtd for Ubuntu18, but it is available for >>> Ubuntu20. >>>>> >>>>> Your other option is to get an iSCSI from your quorum system and >use >>> that for SBD. >>>>> For watchdog, you can use 'softdog' kernel module or you can use >KVM >>> to present one to the VMs. >>>>> You can also check the '-P' flag for SBD. >>>> With kvm please use the qemu-watchdog and try to >>>> prevent using softdogwith SBD. >>>> Especially if you are aiming for a production-cluster ... >>>> >>>> Adding something like that to libvirt-xml should do the trick: >>>> <watchdog model='i6300esb' action='reset'> >>>> <address type='pci' domain='0x0000' bus='0x00' slot='0x07' >>>> function='0x0'/> >>>> </watchdog> >>>> >>>>> >>>>> Best Regards, >>>>> Strahil Nikolov >>>>> >>>>> На 7 юли 2020 г. 10:11:38 GMT+03:00, >>> "stefan.schm...@farmpartner-tec.com" >>> <stefan.schm...@farmpartner-tec.com> написа: >>>>>>> What does 'virsh list' >>>>>>> give you onthe 2 hosts? Hopefully different names for >>>>>>> the VMs ... >>>>>> Yes, each host shows its own >>>>>> >>>>>> # virsh list >>>>>> Id Name Status >>>>>> ---------------------------------------------------- >>>>>> 2 kvm101 running >>>>>> >>>>>> # virsh list >>>>>> Id Name State >>>>>> ---------------------------------------------------- >>>>>> 1 kvm102 running >>>>>> >>>>>> >>>>>> >>>>>>> Did you try 'fence_xvm -a {mcast-ip} -o list' on the >>>>>>> guests as well? >>>>>> fence_xvm sadly does not work on the Ubuntu guests. The howto >said >>> to >>>>>> install "yum install fence-virt fence-virtd" which do not exist >as >>>>>> such >>>>>> in Ubuntu 18.04. After we tried to find the appropiate packages >we >>>>>> installed "libvirt-clients" and "multipath-tools". Is there maybe >>>>>> something misisng or completely wrong? >>>>>> Though we can connect to both hosts using "nc -z -v -u >>> 192.168.1.21 >>>>>> 1229", that just works fine. >>>>>> >>>> without fence-virt you can't expect the whole thing to work. >>>> maybe you can build it for your ubuntu-version from sources of >>>> a package for another ubuntu-version if it doesn't exist yet. >>>> btw. which pacemaker-version are you using? >>>> There was a convenience-fix on the master-branch for at least >>>> a couple of days (sometimes during 2.0.4 release-cycle) that >>>> wasn't compatible with fence_xvm. >>>>>>> Usually, the biggest problem is the multicast traffic - as in >>> many >>>>>>> environments it can be dropped by firewalls. >>>>>> To make sure I have requested our Datacenter techs to verify that >>>>>> multicast Traffic can move unhindered in our local Network. But >in >>> the >>>>>> past on multiple occasions they have confirmed, that local >traffic >>> is >>>>>> not filtered in any way. But Since now I have never specifically >>> asked >>>>>> for multicast traffic, which I now did. I am waiting for an >answer >>> to >>>>>> that question. >>>>>> >>>>>> >>>>>> kind regards >>>>>> Stefan Schmitz >>>>>> >>>>>> Am 06.07.2020 um 11:24 schrieb Klaus Wenninger: >>>>>>> On 7/6/20 10:10 AM, stefan.schm...@farmpartner-tec.com wrote: >>>>>>>> Hello, >>>>>>>> >>>>>>>>>> # fence_xvm -o list >>>>>>>>>> kvm102 >>>>>> bab3749c-15fc-40b7-8b6c-d4267b9f0eb9 >>>>>>>>>> on >>>>>>>>> This should show both VMs, so getting to that point will >likely >>>>>> solve >>>>>>>>> your problem. fence_xvm relies on multicast, there could be >some >>>>>>>>> obscure network configuration to get that working on the VMs. >>>>>>> You said you tried on both hosts. What does 'virsh list' >>>>>>> give you onthe 2 hosts? Hopefully different names for >>>>>>> the VMs ... >>>>>>> Did you try 'fence_xvm -a {mcast-ip} -o list' on the >>>>>>> guests as well? >>>>>>> Did you try pinging via the physical network that is >>>>>>> connected tothe bridge configured to be used for >>>>>>> fencing? >>>>>>> If I got it right fence_xvm should supportcollecting >>>>>>> answersfrom multiple hosts but I found a suggestion >>>>>>> to do a setup with 2 multicast-addresses & keys for >>>>>>> each host. >>>>>>> Which route did you go? >>>>>>> >>>>>>> Klaus >>>>>>>> Thank you for pointing me in that direction. We have tried to >>> solve >>>>>>>> that but with no success. We were using an howto provided here >>>>>>>> https://wiki.clusterlabs.org/wiki/Guest_Fencing >>>>>>>> >>>>>>>> Problem is, it specifically states that the tutorial does not >yet >>>>>>>> support the case where guests are running on multiple hosts. >>> There >>>>>> are >>>>>>>> some short hints what might be necessary to do, but working >>> through >>>>>>>> those sadly just did not work nor where there any clues which >>> would >>>>>>>> help us finding a solution ourselves. So now we are completely >>> stuck >>>>>>>> here. >>>>>>>> >>>>>>>> Has someone the same configuration with Guest VMs on multiple >>> hosts? >>>>>>>> And how did you manage to get that to work? What do we need to >do >>> to >>>>>>>> resolve this? Is there maybe even someone who would be willing >to >>>>>> take >>>>>>>> a closer look at our server? Any help would be greatly >>> appreciated! >>>>>>>> >>>>>>>> Kind regards >>>>>>>> Stefan Schmitz >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> Am 03.07.2020 um 02:39 schrieb Ken Gaillot: >>>>>>>>> On Thu, 2020-07-02 at 17:18 +0200, >>>>>> stefan.schm...@farmpartner-tec.com >>>>>>>>> wrote: >>>>>>>>>> Hello, >>>>>>>>>> >>>>>>>>>> I hope someone can help with this problem. We are (still) >>> trying >>>>>> to >>>>>>>>>> get >>>>>>>>>> Stonith to achieve a running active/active HA Cluster, but >>> sadly >>>>>> to >>>>>>>>>> no >>>>>>>>>> avail. >>>>>>>>>> >>>>>>>>>> There are 2 Centos Hosts. On each one there is a virtual >Ubuntu >>>>>> VM. >>>>>>>>>> The >>>>>>>>>> Ubuntu VMs are the ones which should form the HA Cluster. >>>>>>>>>> >>>>>>>>>> The current status is this: >>>>>>>>>> >>>>>>>>>> # pcs status >>>>>>>>>> Cluster name: pacemaker_cluster >>>>>>>>>> WARNING: corosync and pacemaker node names do not match (IPs >>> used >>>>>> in >>>>>>>>>> setup?) >>>>>>>>>> Stack: corosync >>>>>>>>>> Current DC: server2ubuntu1 (version 1.1.18-2b07d5c5a9) - >>> partition >>>>>>>>>> with >>>>>>>>>> quorum >>>>>>>>>> Last updated: Thu Jul 2 17:03:53 2020 >>>>>>>>>> Last change: Thu Jul 2 14:33:14 2020 by root via cibadmin on >>>>>>>>>> server4ubuntu1 >>>>>>>>>> >>>>>>>>>> 2 nodes configured >>>>>>>>>> 13 resources configured >>>>>>>>>> >>>>>>>>>> Online: [ server2ubuntu1 server4ubuntu1 ] >>>>>>>>>> >>>>>>>>>> Full list of resources: >>>>>>>>>> >>>>>>>>>> stonith_id_1 (stonith:external/libvirt): Stopped >>>>>>>>>> Master/Slave Set: r0_pacemaker_Clone [r0_pacemaker] >>>>>>>>>> Masters: [ server4ubuntu1 ] >>>>>>>>>> Slaves: [ server2ubuntu1 ] >>>>>>>>>> Master/Slave Set: WebDataClone [WebData] >>>>>>>>>> Masters: [ server2ubuntu1 server4ubuntu1 ] >>>>>>>>>> Clone Set: dlm-clone [dlm] >>>>>>>>>> Started: [ server2ubuntu1 server4ubuntu1 ] >>>>>>>>>> Clone Set: ClusterIP-clone [ClusterIP] (unique) >>>>>>>>>> ClusterIP:0 (ocf::heartbeat:IPaddr2): >>> Started >>>>>>>>>> server2ubuntu1 >>>>>>>>>> ClusterIP:1 (ocf::heartbeat:IPaddr2): >>> Started >>>>>>>>>> server4ubuntu1 >>>>>>>>>> Clone Set: WebFS-clone [WebFS] >>>>>>>>>> Started: [ server4ubuntu1 ] >>>>>>>>>> Stopped: [ server2ubuntu1 ] >>>>>>>>>> Clone Set: WebSite-clone [WebSite] >>>>>>>>>> Started: [ server4ubuntu1 ] >>>>>>>>>> Stopped: [ server2ubuntu1 ] >>>>>>>>>> >>>>>>>>>> Failed Actions: >>>>>>>>>> * stonith_id_1_start_0 on server2ubuntu1 'unknown error' (1): >>>>>>>>>> call=201, >>>>>>>>>> status=Error, exitreason='', >>>>>>>>>> last-rc-change='Thu Jul 2 14:37:35 2020', >queued=0ms, >>>>>>>>>> exec=3403ms >>>>>>>>>> * r0_pacemaker_monitor_60000 on server2ubuntu1 'master' (8): >>>>>>>>>> call=203, >>>>>>>>>> status=complete, exitreason='', >>>>>>>>>> last-rc-change='Thu Jul 2 14:38:39 2020', >queued=0ms, >>>>>> exec=0ms >>>>>>>>>> * stonith_id_1_start_0 on server4ubuntu1 'unknown error' (1): >>>>>>>>>> call=202, >>>>>>>>>> status=Error, exitreason='', >>>>>>>>>> last-rc-change='Thu Jul 2 14:37:39 2020', >queued=0ms, >>>>>>>>>> exec=3411ms >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> The stonith resoursce is stopped and does not seem to work. >>>>>>>>>> On both hosts the command >>>>>>>>>> # fence_xvm -o list >>>>>>>>>> kvm102 >>>>>> bab3749c-15fc-40b7-8b6c-d4267b9f0eb9 >>>>>>>>>> on >>>>>>>>> This should show both VMs, so getting to that point will >likely >>>>>> solve >>>>>>>>> your problem. fence_xvm relies on multicast, there could be >some >>>>>>>>> obscure network configuration to get that working on the VMs. >>>>>>>>> >>>>>>>>>> returns the local VM. Apparently it connects through the >>>>>>>>>> Virtualization >>>>>>>>>> interface because it returns the VM name not the Hostname of >>> the >>>>>>>>>> client >>>>>>>>>> VM. I do not know if this is how it is supposed to work? >>>>>>>>> Yes, fence_xvm knows only about the VM names. >>>>>>>>> >>>>>>>>> To get pacemaker to be able to use it for fencing the cluster >>>>>> nodes, >>>>>>>>> you have to add a pcmk_host_map parameter to the fencing >>> resource. >>>>>> It >>>>>>>>> looks like >>> pcmk_host_map="nodename1:vmname1;nodename2:vmname2;..." >>>>>>>>> >>>>>>>>>> In the local network, every traffic is allowed. No firewall >is >>>>>>>>>> locally >>>>>>>>>> active, just the connections leaving the local network are >>>>>>>>>> firewalled. >>>>>>>>>> Hence there are no coneection problems between the hosts and >>>>>> clients. >>>>>>>>>> For example we can succesfully connect from the clients to >the >>>>>> Hosts: >>>>>>>>>> # nc -z -v -u 192.168.1.21 1229 >>>>>>>>>> Ncat: Version 7.50 ( https://nmap.org/ncat ) >>>>>>>>>> Ncat: Connected to 192.168.1.21:1229. >>>>>>>>>> Ncat: UDP packet sent successfully >>>>>>>>>> Ncat: 1 bytes sent, 0 bytes received in 2.03 seconds. >>>>>>>>>> >>>>>>>>>> # nc -z -v -u 192.168.1.13 1229 >>>>>>>>>> Ncat: Version 7.50 ( https://nmap.org/ncat ) >>>>>>>>>> Ncat: Connected to 192.168.1.13:1229. >>>>>>>>>> Ncat: UDP packet sent successfully >>>>>>>>>> Ncat: 1 bytes sent, 0 bytes received in 2.03 seconds. >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> On the Ubuntu VMs we created and configured the the stonith >>>>>> resource >>>>>>>>>> according to the howto provided here: >>>>>>>>>> >>>>>> >>> >https://clusterlabs.org/pacemaker/doc/en-US/Pacemaker/1.1/pdf/Clusters_from_Scratch/Pacemaker-1.1-Clusters_from_Scratch-en-US.pdf >>>>>>>>>> >>>>>>>>>> The actual line we used: >>>>>>>>>> # pcs -f stonith_cfg stonith create stonith_id_1 >>> external/libvirt >>>>>>>>>> hostlist="Host4,host2" >>>>>>>>>> hypervisor_uri="qemu+ssh://192.168.1.21/system" >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> But as you can see in in the pcs status output, stonith is >>> stopped >>>>>>>>>> and >>>>>>>>>> exits with an unkown error. >>>>>>>>>> >>>>>>>>>> Can somebody please advise on how to procced or what >additionla >>>>>>>>>> information is needed to solve this problem? >>>>>>>>>> Any help would be greatly appreciated! Thank you in advance. >>>>>>>>>> >>>>>>>>>> Kind regards >>>>>>>>>> Stefan Schmitz >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>> _______________________________________________ >>>>>>>> Manage your subscription: >>>>>>>> https://lists.clusterlabs.org/mailman/listinfo/users >>>>>>>> >>>>>>>> ClusterLabs home: https://www.clusterlabs.org/ >>>>>>>> >>>> _______________________________________________ Manage your subscription: https://lists.clusterlabs.org/mailman/listinfo/users ClusterLabs home: https://www.clusterlabs.org/