Kostiantyn Ponomarenko napsal(a):
Thank you for the suggestion.
The OS is Debian 8. All Packages are build by myself.
libqb-0.17.2
corosync-2.3.5
cluster-glue-1.0.12
pacemaker-1.1.13
It is really important for me to understand what is happening with the
cluster under the high load.
For Corosync it's really simple. Corosync has to be scheduled by OS
regularly (more often than it's current token timeout) to be able to
detect membership changes and send/receive messages (cpg). If it's not
scheduled, membership is not up to date and eventually when it's finally
scheduled, it logs "process was not scheduled for ... ms" message
(warning for user) and if corosync was not scheduled for more than token
timeout "Process pause detected for ..." message is displayed and new
membership is formed. Other nodes (if scheduled regularly) sees non
regularly scheduled node as dead.
So I would appreciate any help here =)
There is really no help. It's best to make sure corosync is scheduled
regularly.
Thank you,
Kostia
On Wed, Feb 17, 2016 at 5:02 PM, Greg Woods <[email protected]> wrote:
On Wed, Feb 17, 2016 at 3:30 AM, Kostiantyn Ponomarenko <
[email protected]> wrote:
Jan 29 07:00:43 B5-2U-205-LS corosync[2742]: [MAIN ] Corosync main
process was not scheduled for 12483.7363 ms (threshold is 800.0000 ms).
Consider token timeout increase.
I was having this problem as well. You don't say which version of corosync
you are running or on what OS, but on CentOS 7, there is an available
This update sets round robin realtime scheduling for corosync by
default. Same can be achieved without update by editing
/etc/sysconfig/corosync and changing COROSYNC_OPTIONS line to something
like COROSYNC_OPTIONS="-r"
Regards,
Honza
update that looks like it might address this (it has to do with
scheduling). We haven't gotten around to actually applying it yet because
it will require some down time on production services (we do have a few
node-locked VMs in our cluster), and it only happens when the system is
under very high load, so I can't say for sure the update will fix the
issue, but it might be worth looking into.
--Greg
_______________________________________________
Users mailing list: [email protected]
http://clusterlabs.org/mailman/listinfo/users
Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org
_______________________________________________
Users mailing list: [email protected]
http://clusterlabs.org/mailman/listinfo/users
Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org
_______________________________________________
Users mailing list: [email protected]
http://clusterlabs.org/mailman/listinfo/users
Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org