Re: [ClusterLabs] Corosync main process was not scheduled for 115935.2266 ms (threshold is 800.0000 ms). Consider token timeout increase.

Jan Friesse Wed, 17 Feb 2016 08:50:12 -0800

Kostiantyn Ponomarenko napsal(a):

Thank you for the suggestion.
The OS is Debian 8. All Packages are build by myself.
libqb-0.17.2
corosync-2.3.5
cluster-glue-1.0.12
pacemaker-1.1.13


It is really important for me to understand what is happening with the
cluster under the high load.

For Corosync it's really simple. Corosync has to be scheduled by OSregularly (more often than it's current token timeout) to be able todetect membership changes and send/receive messages (cpg). If it's notscheduled, membership is not up to date and eventually when it's finallyscheduled, it logs "process was not scheduled for ... ms" message(warning for user) and if corosync was not scheduled for more than tokentimeout "Process pause detected for ..." message is displayed and newmembership is formed. Other nodes (if scheduled regularly) sees nonregularly scheduled node as dead.

So I would appreciate any help here =)

There is really no help. It's best to make sure corosync is scheduledregularly.



Thank you,
Kostia

On Wed, Feb 17, 2016 at 5:02 PM, Greg Woods <[email protected]> wrote:


On Wed, Feb 17, 2016 at 3:30 AM, Kostiantyn Ponomarenko <
[email protected]> wrote:

Jan 29 07:00:43 B5-2U-205-LS corosync[2742]: [MAIN  ] Corosync main
process was not scheduled for 12483.7363 ms (threshold is 800.0000 ms).
Consider token timeout increase.



I was having this problem as well. You don't say which version of corosync
you are running or on what OS, but on CentOS 7, there is an available

This update sets round robin realtime scheduling for corosync bydefault. Same can be achieved without update by editing/etc/sysconfig/corosync and changing COROSYNC_OPTIONS line to somethinglike COROSYNC_OPTIONS="-r"


Regards,
  Honza

update that looks like it might address this (it has to do with
scheduling). We haven't gotten around to actually applying it yet because
it will require some down time on production services (we do have a few
node-locked VMs in our cluster), and it only happens when the system is
under very high load, so I can't say for sure the update will fix the
issue, but it might be worth looking into.

--Greg


_______________________________________________
Users mailing list: [email protected]
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org




_______________________________________________
Users mailing list: [email protected]
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org



_______________________________________________
Users mailing list: [email protected]
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Re: [ClusterLabs] Corosync main process was not scheduled for 115935.2266 ms (threshold is 800.0000 ms). Consider token timeout increase.

Reply via email to