Re: [ClusterLabs] Still Beginner STONITH Problem

stefan.schm...@farmpartner-tec.com Mon, 06 Jul 2020 01:11:20 -0700

Hello,

>> # fence_xvm -o list
>> kvm102                           bab3749c-15fc-40b7-8b6c-d4267b9f0eb9
>> on


>This should show both VMs, so getting to that point will likely solve
>your problem. fence_xvm relies on multicast, there could be some
>obscure network configuration to get that working on the VMs.

Thank you for pointing me in that direction. We have tried to solve thatbut with no success. We were using an howto provided herehttps://wiki.clusterlabs.org/wiki/Guest_Fencing

Problem is, it specifically states that the tutorial does not yetsupport the case where guests are running on multiple hosts. There aresome short hints what might be necessary to do, but working throughthose sadly just did not work nor where there any clues which would helpus finding a solution ourselves. So now we are completely stuck here.

Has someone the same configuration with Guest VMs on multiple hosts? Andhow did you manage to get that to work? What do we need to do to resolvethis? Is there maybe even someone who would be willing to take a closerlook at our server? Any help would be greatly appreciated!


Kind regards
Stefan Schmitz



Am 03.07.2020 um 02:39 schrieb Ken Gaillot:

On Thu, 2020-07-02 at 17:18 +0200, stefan.schm...@farmpartner-tec.com
wrote:

Hello,

I hope someone can help with this problem. We are (still) trying to
get
Stonith to achieve a running active/active HA Cluster, but sadly to
no
avail.

There are 2 Centos Hosts. On each one there is a virtual Ubuntu VM.
The
Ubuntu VMs are the ones which should form the HA Cluster.

The current status is this:

# pcs status
Cluster name: pacemaker_cluster
WARNING: corosync and pacemaker node names do not match (IPs used in
setup?)
Stack: corosync
Current DC: server2ubuntu1 (version 1.1.18-2b07d5c5a9) - partition
with
quorum
Last updated: Thu Jul  2 17:03:53 2020
Last change: Thu Jul  2 14:33:14 2020 by root via cibadmin on
server4ubuntu1

2 nodes configured
13 resources configured

Online: [ server2ubuntu1 server4ubuntu1 ]

Full list of resources:

   stonith_id_1   (stonith:external/libvirt):     Stopped
   Master/Slave Set: r0_pacemaker_Clone [r0_pacemaker]
       Masters: [ server4ubuntu1 ]
       Slaves: [ server2ubuntu1 ]
   Master/Slave Set: WebDataClone [WebData]
       Masters: [ server2ubuntu1 server4ubuntu1 ]
   Clone Set: dlm-clone [dlm]
       Started: [ server2ubuntu1 server4ubuntu1 ]
   Clone Set: ClusterIP-clone [ClusterIP] (unique)
       ClusterIP:0        (ocf::heartbeat:IPaddr2):       Started
server2ubuntu1
       ClusterIP:1        (ocf::heartbeat:IPaddr2):       Started
server4ubuntu1
   Clone Set: WebFS-clone [WebFS]
       Started: [ server4ubuntu1 ]
       Stopped: [ server2ubuntu1 ]
   Clone Set: WebSite-clone [WebSite]
       Started: [ server4ubuntu1 ]
       Stopped: [ server2ubuntu1 ]

Failed Actions:
* stonith_id_1_start_0 on server2ubuntu1 'unknown error' (1):
call=201,
status=Error, exitreason='',
      last-rc-change='Thu Jul  2 14:37:35 2020', queued=0ms,
exec=3403ms
* r0_pacemaker_monitor_60000 on server2ubuntu1 'master' (8):
call=203,
status=complete, exitreason='',
      last-rc-change='Thu Jul  2 14:38:39 2020', queued=0ms, exec=0ms
* stonith_id_1_start_0 on server4ubuntu1 'unknown error' (1):
call=202,
status=Error, exitreason='',
      last-rc-change='Thu Jul  2 14:37:39 2020', queued=0ms,
exec=3411ms


The stonith resoursce is stopped and does not seem to work.
On both hosts the command
# fence_xvm -o list
kvm102                           bab3749c-15fc-40b7-8b6c-d4267b9f0eb9
on


This should show both VMs, so getting to that point will likely solve
your problem. fence_xvm relies on multicast, there could be some
obscure network configuration to get that working on the VMs.

returns the local VM. Apparently it connects through the
Virtualization
interface because it returns the VM name not the Hostname of the
client
VM. I do not know if this is how it is supposed to work?


Yes, fence_xvm knows only about the VM names.

To get pacemaker to be able to use it for fencing the cluster nodes,
you have to add a pcmk_host_map parameter to the fencing resource. It
looks like pcmk_host_map="nodename1:vmname1;nodename2:vmname2;..."

In the local network, every traffic is allowed. No firewall is
locally
active, just the connections leaving the local network are
firewalled.
Hence there are no coneection problems between the hosts and clients.
For example we can succesfully connect from the clients to the Hosts:

# nc -z -v -u 192.168.1.21 1229
Ncat: Version 7.50 ( https://nmap.org/ncat )
Ncat: Connected to 192.168.1.21:1229.
Ncat: UDP packet sent successfully
Ncat: 1 bytes sent, 0 bytes received in 2.03 seconds.

# nc -z -v -u 192.168.1.13 1229
Ncat: Version 7.50 ( https://nmap.org/ncat )
Ncat: Connected to 192.168.1.13:1229.
Ncat: UDP packet sent successfully
Ncat: 1 bytes sent, 0 bytes received in 2.03 seconds.


On the Ubuntu VMs we created and configured the the stonith resource
according to the  howto provided here:
https://clusterlabs.org/pacemaker/doc/en-US/Pacemaker/1.1/pdf/Clusters_from_Scratch/Pacemaker-1.1-Clusters_from_Scratch-en-US.pdf

The actual line we used:
# pcs -f stonith_cfg stonith create stonith_id_1 external/libvirt
hostlist="Host4,host2"
hypervisor_uri="qemu+ssh://192.168.1.21/system"


But as you can see in in the pcs status output, stonith is stopped
and
exits with an unkown error.

Can somebody please advise on how to procced or what additionla
information is needed to solve this problem?
Any help would be greatly appreciated! Thank you in advance.

Kind regards
Stefan Schmitz

_______________________________________________
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/

Re: [ClusterLabs] Still Beginner STONITH Problem

Reply via email to