Re: [ClusterLabs] [corosync][Problem] Very long "pause detect ... " was detected.

renayama19661014 Mon, 13 Jun 2016 02:54:39 -0700

Hi Honza,

Thank you for comment.



>>  Our user constituted a cluster in corosync and Pacemaker in the next 
> environment.
>>  The cluster constituted it among guests.
>> 
>>  * Host/Guest : RHEL6.6 - kernel : 2.6.32-504.el6.x86_64
>>  * libqb 0.17.1
>>  * corosync 2.3.4
>>  * Pacemaker 1.1.12
>> 
>>  The cluster worked well.
>>  When a user stopped an active guest, the next log was output in standby 
> guests repeatedly.
> 
> What exactly you mean by "active guest" and "standby 
> guests"?

The cluster is active / standby constitution.

As for the standby guest, a wait is in a state until a resource breaks down in 
active guests.


When a resource was replaced by standby, this problem seemed to occur.


> 
>> 
>>  May xx xx:25:53 standby-guest corosync[6311]:  [TOTEM ] Process pause 
> detected for 5515870 ms, flushing membership messages.
>>  May xx xx:25:53 standby-guest corosync[6311]:  [TOTEM ] Process pause 
> detected for 5515920 ms, flushing membership messages.
>>  May xx xx:25:53 standby-guest corosync[6311]:  [TOTEM ] Process pause 
> detected for 5515971 ms, flushing membership messages.
>>  May xx xx:25:53 standby-guest corosync[6311]:  [TOTEM ] Process pause 
> detected for 5516021 ms, flushing membership messages.
>>  May xx xx:25:53 standby-guest corosync[6311]:  [TOTEM ] Process pause 
> detected for 5516071 ms, flushing membership messages.
>>  May xx xx:25:53 standby-guest corosync[6311]:  [TOTEM ] Process pause 
> detected for 5516121 ms, flushing membership messages.
>>  May xx xx:25:53 standby-guest corosync[6311]:  [TOTEM ] Process pause 
> detected for 5516171 ms, flushing membership messages.
>>  May xx xx:25:53 standby-guest corosync[6311]:  [TOTEM ] Process pause 
> detected for 5516221 ms, flushing membership messages.
>>  May xx xx:25:53 standby-guest corosync[6311]:  [TOTEM ] Process pause 
> detected for 5516271 ms, flushing membership messages.
>>  May xx xx:25:53 standby-guest corosync[6311]:  [TOTEM ] Process pause 
> detected for 5516322 ms, flushing membership messages.
>>  May xx xx:25:53 standby-guest corosync[6311]:  [TOTEM ] Process pause 
> detected for 5516372 ms, flushing membership messages.
>>  (snip)
>>  May xx xx:26:03 standby-guest corosync[6311]:  [TOTEM ] Process pause 
> detected for 5526172 ms, flushing membership messages.
>>  May xx xx:26:03 standby-guest corosync[6311]:  [MAIN  ] Totem is unable to 
> form a cluster because of an operating system or network fault. The most 
> common 
> cause of this message is that the local firewall is configured improperly.
>>  May xx xx:26:03 standby-guest corosync[6311]:  [TOTEM ] Process pause 
> detected for 5526222 ms, flushing membership messages.
>>  (snip)
>> 
> 
> This is weird. Not because of enormous pause length but because corosync 
> has a "scheduler pause" detector which warns before "Process 
> pause 
> detected ..." error is logged.

I thought so, too.
However, "scheduler pause" does not seem to be taking place.

> 
>>  As a result, the standby guest failed in the construction of the 
> independent cluster.
>> 
>>  It is recorded in log as if a timer stopped for 91 minutes.
>>  It is abnormal length for 91 minutes.
>> 
>>  Did you see a similar problem?
> 
> Never

Okay!


> 
>> 
>>  Possibly I think whether it is libqb or Kernel or some kind of problems.
> 
> What virtualization technology are you using? KVM?
> 
>>  * I suspect that the set of the timer failed in reset_pause_timeout().
> 
> You can try to put asserts into this function, but there is really not 
> too much reasons why it should fail (ether malloc returns NULL or some 
> nasty memory corruption).


I read a source code, too.
However, it is the street of your opinion.

I do not know whether a problem reappears, but I constitute it in RHEL6.6 and 
intend to take load this week.

If any you have noticed, please give me an email.

Best Regards,
Hideo Yamauchi.


_______________________________________________
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Re: [ClusterLabs] [corosync][Problem] Very long "pause detect ... " was detected.

Reply via email to