Jan Friesse <jfrie...@redhat.com> writes: > wf...@niif.hu writes: > >> In a 6-node cluster (vhbl03-08) the following happens 1-5 times a day >> (in August; in May, it happened 0-2 times a day only, it's slowly >> ramping up): >> >> vhbl08 corosync[3687]: [TOTEM ] A processor failed, forming new >> configuration. >> vhbl03 corosync[3890]: [TOTEM ] A processor failed, forming new >> configuration. >> vhbl07 corosync[3805]: [MAIN ] Corosync main process was not scheduled >> for 4317.0054 ms (threshold is 2400.0000 ms). Consider token timeout >> increase. > > ^^^ This is main problem you have to solve. It usually means that > machine is too overloaded. [...]
Before I start tracing the scheduler, I'd like to ask something: what wakes up the Corosync main process periodically? The token making a full circle? (Please forgive my simplistic understanding of the TOTEM protocol.) That would explain the recommendation in the log message, but does not fit well with the overload assumption: totally idle nodes could just as easily produce such warnings if there are no other regular wakeup sources. (I'm looking at timer_function_scheduler_timeout but I know too little of libqb to decide.) > As a start you can try what message say = Consider token timeout > increase. Currently you have 3 seconds, in theory 6 second should be > enough. It was probably high time I realized that token timeout is scaled automatically when one has a nodelist. When you say Corosync should work OK with default settings up to 16 nodes, you assume this scaling is in effect, don't you? On the other hand, I've got no nodelist in the config, but token = 3000, which is less than the default 1000+4*650 with six nodes, and this will get worse as the cluster grows. Comments on the above ramblings welcome! I'm grateful for all the valuable input poured into this thread by all parties: it's proven really educative in quite unexpected ways beyond what I was able to ask in the beginning. -- Thanks, Feri _______________________________________________ Users mailing list: Users@clusterlabs.org http://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org