Hi, We see the following messages almost everyday in our 2 node cluster and resources gets migrated when it happens:
[16187] node1 corosyncwarning [MAIN ] Corosync main process was not scheduled for 2889.8477 ms (threshold is 800.0000 ms). Consider token timeout increase. [16187] node1 corosyncnotice [TOTEM ] c. [16187] node1 corosyncnotice [TOTEM ] A new membership (192.168.0.1:1268) was formed. Members joined: 2 left: 2 [16187] node1 corosyncnotice [TOTEM ] Failed to receive the leave message. failed: 2 After setting the token timeout to 6000ms, at least the "Failed to receive the leave message" doesn't appear anymore. But we see corosync timeout errors: [16395] node1 corosyncwarning [MAIN ] Corosync main process was not scheduled for 6660.9043 ms (threshold is 4800.0000 ms). Consider token timeout increase. 1. Why is the set timeout not in effect? It's 4800ms instead of 6000ms. 2. How to fix this? We have not much load on the nodes, the corosync is already running with RT priority. The following is the details of OS and packages: Kernel: 3.10.0-957.el7.x86_64 OS: Oracle Linux Server 7.6 corosync-2.4.3-4.el7.x86_64 corosynclib-2.4.3-4.el7.x86_64 Thanks in advance. -- Regards, Jeevan. Create your own email signature <https://www.wisestamp.com/signature-in-email?utm_source=promotion&utm_medium=signature&utm_campaign=create_your_own>
_______________________________________________ Manage your subscription: https://lists.clusterlabs.org/mailman/listinfo/users ClusterLabs home: https://www.clusterlabs.org/
