[Resending the below due to message format problem]

Dear List,

I have been running two different 3-node clusters for some time. I am having a fatal problem with corosync: After a node failure, rebooted node does NOT start corosync.

Clusters;

 * All nodes are running Ubuntu Server 24.04
 * corosync is 3.1.7
 * corosync-qdevice is 3.0.3
 * pacemaker is 2.1.6
 * The third node at both clusters is a quorum device. Cluster is on
   ffsplit algorithm.
 * All nodes are baremetal & attached to a dedicated kronosnet network.
 * STONITH is enabled in one of the clusters and disabled for the other.

corosync & pacemaker service starts (systemd) are disabled. I am starting any cluster with the command pcs cluster start.

corosync NEVER starts AFTER a node failure (node is rebooted). There is nothing in /var/log/corosync/corosync.log, service freezes as:

Aug 01 12:54:56 [3193] charon corosync notice  [MAIN  ] Corosync Cluster Engine 3.1.7 starting up Aug 01 12:54:56 [3193] charon corosync info    [MAIN  ] Corosync built-in features: dbus monitoring watchdog augeas systemd xmlconf vqsim nozzle snmp pie relro bindnow

corosync never starts kronosnet. I checked kronosnet interfaces, all OK, there is IP connectivity in between. If I do corosync -t, it is the same freeze.

I could ONLY manage to start corosync by reinstalling it: apt reinstall corosync ; pcs cluster start.

The above issue repeated itself at least 5-6 times. I do NOT see anything in syslog either. I will be glad if you lead me on how to solve this.

Thanks,

Murat

_______________________________________________
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/

Reply via email to