Hi,

It looks like the PVE cluster filesystem is out of sync or has a problematic connection to corosync. From corosyncs stand point the node addition worked, and is consistent on all nodes, which is good.

Now, some log from `proxmox03`- the problematic node - would be nice:

# journalctl -u corosync -u pve-cluster

As I may not get back to you until Tuesday I give you a quite possible resolution now:

I'd suggest restarting pve-cluster *but* as the pve-ha-lrm and its watchdog is active on node proxmox03, it could result in a node reset *if* pve-cluster cannot connect back to corosync or fails in another way. This is not likely but if you can not schedule a maintenance window it should be taken care of.
First restartthe pve-ha-crm so that another mode takes up the master role.
Then either move all HA-Services from this node or remove them temporarily and stop the pve-ha-lrm and pve-ha-crm services:

# systemctl stop pve-ha-lrm pve-ha-crm

now restart the pve-cluster and corosync (just to be sure) service:

systemctl restart pve-cluster corosync

and check

# cat /etc/pve/.members

It should show all members and the same version number as the other members. If thats the case start pve-ha-lrm and crm again,
all should be clear now again.

Oh and sorry for getting back a bit late :)

cheers,
Thomas


On 05/05/2017 09:38 AM, Mark Schouten wrote:
Thomas, pretty please? :)

On Wed, 2017-05-03 at 09:45 +0200, Mark Schouten wrote:
On Tue, 2017-05-02 at 09:05 +0200, Thomas Lamprecht wrote:
Can you control that the config looks the same on all nodes?
Especially the difference between working and misbehaving nodes
would
be
interesting.
Please see the attachment. That includes /etc/pve/.members and
/etc/pve/corosync.conf from all nodes. Only the .members file of the
misbehaving node is off.

In general you could just restart CRM, but the CRM is capable of
syncing
in new nodes while running, so there shouldn't be any need for
that,
the
patches you linked also do not change that, AFAIK.
I would like to do a sync without a restart as well, but what would
trigger this?

As /etc/pve.members doesn't shows the new node on the misbehaving
one
the problem is another one.
Who is the current master? Can you give me an output of:
# ha-manager status
# pvecm status
# cat /etc/pve/corosync.conf
Output in the attachment. Because the misbehaving node also is the
master, output of ha-manager is identical on all nodes.


_______________________________________________
pve-user mailing list
pve-user@pve.proxmox.com
http://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user


_______________________________________________
pve-user mailing list
pve-user@pve.proxmox.com
http://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user

Reply via email to