Hi,

The Cluster network is a separate network. So I would follow your idea of 
overload based on many writes to a not reachable NFS Server.
In the meantime we were able to reboot the node and everything looks better 
know. The VMs were restarted so.

Next time I give the 
        systemctl restart corosync pve-cluster
a try. This hopefully will not reset any running vms.

Immo
-----Original Message-----
From: pve-user [mailto:[email protected]] On Behalf Of Thomas 
Lamprecht
Sent: Tuesday, March 28, 2017 7:57 AM
To: [email protected]
Subject: Re: [PVE-User] Quorum lost cos of storage backbone problems

Hi,

On 03/23/2017 05:20 PM, IMMO WETZEL wrote:
> Our Storage backbone had some problems during this, one of the nodes lost his 
> quorum, may be cos of many vms at this host had lot of NFS mounted disks.
> How can I bring back the host into the Cluster without rebooting?

Was the cluster network on the storage backbone (I assume you mean network 
here)?

If not, the loss could be the result of heavy load on the node resulting from 
the outage.
Else this would be weird, as quorum does not depend directly on the running (or 
failed) VMs.

I'd check if the problematic Node can send to the other nodes via multicast [1] 
then restarting corosync and eventually pve-cluster should do it:

systemctl restart corosync pve-cluster

cheers,
Thomas

[1]
http://pve.proxmox.com/pve-docs/chapter-pvecm.html#cluster-network-requirements

_______________________________________________
pve-user mailing list
[email protected]
http://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user
_______________________________________________
pve-user mailing list
[email protected]
http://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user

Reply via email to