Re: [PVE-User] Cluster disaster

Thomas Lamprecht Mon, 14 Nov 2016 03:34:39 -0800


On 14.11.2016 11:50, Dhaussy Alexandre wrote:


Le 11/11/2016 à 19:43, Dietmar Maurer a écrit :

On November 11, 2016 at 6:41 PM Dhaussy Alexandre
<[email protected]> wrote:

you lost quorum, and the watchdog expired - that is how the watchdog
based fencing works.

I don't expect to loose quorum when _one_ node joins or leave the cluster.

This was probably a long time before - but I have not read through the whole
logs ...

That makes no sense to me..
The fact is : everything have been working fine for weeks.


What i can see in the logs is : several reboots of cluster nodes
suddently, and exactly one minute after one node joining and/or leaving
the cluster.


The watchdog is set to an 60 second timeout, meaning that cluster leave caused
quorum loss, or other problems (you said you had multicast problems around that
time) thus the LRM stopped updating the watchdog, so one minute later it 
resetted
all nodes, which left the quorate partition.

I see no problems with corosync/lrm/crm before that.
This leads me to a probable network (multicast) malfunction.

I did a bit of homeworks reading the wiki about ha manager..

What i understand so far, is that every state/service change from LRM
must be acknowledged (cluster-wise) by CRM master.


Yes and no, LRM and CRM are two state machines with synced inputs,
but that holds mainly for human triggered commands and the resulting
communication.
Meaning that commands like start, stop, migrate may not go through from
the CRM to the LRM. Fencing and such stuff works none the less, else it
would be a major design flaw :)

So if a multicast disruption occurs, and i assume LRM wouldn't be able
talk to the CRM MASTER, then it also couldn't reset the watchdog, am i
right ?



No, the watchdog runs on each node and is CRM independent.
As watchdogs are normally not able to server more clients we wrote
the watchdog-mux (multiplexer).
This is a very simple C program which opens the watchdog with a
60 second timeout and allows multiple clients (at the moment CRM
and LRM) to connect to it.
If a client does not resets the dog for about 10 seconds, IIRC, the
watchdox-mux disables watchdogs updates on the real watchdog.
After that a node reset will happen *when* the dog runs out of time,
not instantly.

So if the LRM cannot communicate (i.e. has no quorum) he will stop
updating the dog, thus trigger independent what the CRM says or does.

Another thing ; i have checked my network configuration, the cluster ip
is set on a linux bridge...
By default multicast_snooping is set to 1 on linux bridge, so i think it
there's a good chance this is the source of my problems...
Note that we don't use IGMP snooping, it is disabled on almost all
network switchs.


Yes, multicast snooping has to be configured (recommended) or else turned off 
on the switch.
That's stated in some wiki articles, various forum posts and our docs, here:
http://pve.proxmox.com/pve-docs/chapter-pvecm.html#cluster-network-requirements

Hope that helps a bit understanding. :)

cheers,
Thomas

Plus i found a post by A.Derumier (yes, 3 years old..) He did have
similar issues with bridge and multicast.
http://pve.proxmox.com/pipermail/pve-devel/2013-March/006678.html
_______________________________________________
pve-user mailing list
[email protected]
http://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user


_______________________________________________
pve-user mailing list
[email protected]
http://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user

Re: [PVE-User] Cluster disaster

Reply via email to