Hi,
The Cluster network is a separate network. So I would follow your idea of
overload based on many writes to a not reachable NFS Server.
In the meantime we were able to reboot the node and everything looks better
know. The VMs were restarted so.
Next time I give the
systemctl restart corosync pve-cluster
a try. This hopefully will not reset any running vms.
Immo
-----Original Message-----
From: pve-user [mailto:[email protected]] On Behalf Of Thomas
Lamprecht
Sent: Tuesday, March 28, 2017 7:57 AM
To: [email protected]
Subject: Re: [PVE-User] Quorum lost cos of storage backbone problems
Hi,
On 03/23/2017 05:20 PM, IMMO WETZEL wrote:
> Our Storage backbone had some problems during this, one of the nodes lost his
> quorum, may be cos of many vms at this host had lot of NFS mounted disks.
> How can I bring back the host into the Cluster without rebooting?
Was the cluster network on the storage backbone (I assume you mean network
here)?
If not, the loss could be the result of heavy load on the node resulting from
the outage.
Else this would be weird, as quorum does not depend directly on the running (or
failed) VMs.
I'd check if the problematic Node can send to the other nodes via multicast [1]
then restarting corosync and eventually pve-cluster should do it:
systemctl restart corosync pve-cluster
cheers,
Thomas
[1]
http://pve.proxmox.com/pve-docs/chapter-pvecm.html#cluster-network-requirements
_______________________________________________
pve-user mailing list
[email protected]
http://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user
_______________________________________________
pve-user mailing list
[email protected]
http://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user