Re: [ClusterLabs] [corosync][Problem] Very long "pause detect ... " was detected.

2016-06-13 Thread renayama19661014
Hi Honza,

Thank you for comment.


>>  Our user constituted a cluster in corosync and Pacemaker in the next 
> environment.
>>  The cluster constituted it among guests.
>> 
>>  * Host/Guest : RHEL6.6 - kernel : 2.6.32-504.el6.x86_64
>>  * libqb 0.17.1
>>  * corosync 2.3.4
>>  * Pacemaker 1.1.12
>> 
>>  The cluster worked well.
>>  When a user stopped an active guest, the next log was output in standby 
> guests repeatedly.
> 
> What exactly you mean by "active guest" and "standby 
> guests"?

The cluster is active / standby constitution.

As for the standby guest, a wait is in a state until a resource breaks down in 
active guests.


When a resource was replaced by standby, this problem seemed to occur.


> 
>> 
>>  May xx xx:25:53 standby-guest corosync[6311]:  [TOTEM ] Process pause 
> detected for 5515870 ms, flushing membership messages.
>>  May xx xx:25:53 standby-guest corosync[6311]:  [TOTEM ] Process pause 
> detected for 5515920 ms, flushing membership messages.
>>  May xx xx:25:53 standby-guest corosync[6311]:  [TOTEM ] Process pause 
> detected for 5515971 ms, flushing membership messages.
>>  May xx xx:25:53 standby-guest corosync[6311]:  [TOTEM ] Process pause 
> detected for 5516021 ms, flushing membership messages.
>>  May xx xx:25:53 standby-guest corosync[6311]:  [TOTEM ] Process pause 
> detected for 5516071 ms, flushing membership messages.
>>  May xx xx:25:53 standby-guest corosync[6311]:  [TOTEM ] Process pause 
> detected for 5516121 ms, flushing membership messages.
>>  May xx xx:25:53 standby-guest corosync[6311]:  [TOTEM ] Process pause 
> detected for 5516171 ms, flushing membership messages.
>>  May xx xx:25:53 standby-guest corosync[6311]:  [TOTEM ] Process pause 
> detected for 5516221 ms, flushing membership messages.
>>  May xx xx:25:53 standby-guest corosync[6311]:  [TOTEM ] Process pause 
> detected for 5516271 ms, flushing membership messages.
>>  May xx xx:25:53 standby-guest corosync[6311]:  [TOTEM ] Process pause 
> detected for 5516322 ms, flushing membership messages.
>>  May xx xx:25:53 standby-guest corosync[6311]:  [TOTEM ] Process pause 
> detected for 5516372 ms, flushing membership messages.
>>  (snip)
>>  May xx xx:26:03 standby-guest corosync[6311]:  [TOTEM ] Process pause 
> detected for 5526172 ms, flushing membership messages.
>>  May xx xx:26:03 standby-guest corosync[6311]:  [MAIN  ] Totem is unable to 
> form a cluster because of an operating system or network fault. The most 
> common 
> cause of this message is that the local firewall is configured improperly.
>>  May xx xx:26:03 standby-guest corosync[6311]:  [TOTEM ] Process pause 
> detected for 5526222 ms, flushing membership messages.
>>  (snip)
>> 
> 
> This is weird. Not because of enormous pause length but because corosync 
> has a "scheduler pause" detector which warns before "Process 
> pause 
> detected ..." error is logged.

I thought so, too.
However, "scheduler pause" does not seem to be taking place.

> 
>>  As a result, the standby guest failed in the construction of the 
> independent cluster.
>> 
>>  It is recorded in log as if a timer stopped for 91 minutes.
>>  It is abnormal length for 91 minutes.
>> 
>>  Did you see a similar problem?
> 
> Never

Okay!


> 
>> 
>>  Possibly I think whether it is libqb or Kernel or some kind of problems.
> 
> What virtualization technology are you using? KVM?
> 
>>  * I suspect that the set of the timer failed in reset_pause_timeout().
> 
> You can try to put asserts into this function, but there is really not 
> too much reasons why it should fail (ether malloc returns NULL or some 
> nasty memory corruption).


I read a source code, too.
However, it is the street of your opinion.

I do not know whether a problem reappears, but I constitute it in RHEL6.6 and 
intend to take load this week.

If any you have noticed, please give me an email.

Best Regards,
Hideo Yamauchi.


___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] [corosync][Problem] Very long "pause detect ... " was detected.

2016-06-13 Thread Jan Friesse

Hideo,


Hi All,

Our user constituted a cluster in corosync and Pacemaker in the next 
environment.
The cluster constituted it among guests.

* Host/Guest : RHEL6.6 - kernel : 2.6.32-504.el6.x86_64
* libqb 0.17.1
* corosync 2.3.4
* Pacemaker 1.1.12

The cluster worked well.
When a user stopped an active guest, the next log was output in standby guests 
repeatedly.


What exactly you mean by "active guest" and "standby guests"?



May xx xx:25:53 standby-guest corosync[6311]:  [TOTEM ] Process pause detected 
for 5515870 ms, flushing membership messages.
May xx xx:25:53 standby-guest corosync[6311]:  [TOTEM ] Process pause detected 
for 5515920 ms, flushing membership messages.
May xx xx:25:53 standby-guest corosync[6311]:  [TOTEM ] Process pause detected 
for 5515971 ms, flushing membership messages.
May xx xx:25:53 standby-guest corosync[6311]:  [TOTEM ] Process pause detected 
for 5516021 ms, flushing membership messages.
May xx xx:25:53 standby-guest corosync[6311]:  [TOTEM ] Process pause detected 
for 5516071 ms, flushing membership messages.
May xx xx:25:53 standby-guest corosync[6311]:  [TOTEM ] Process pause detected 
for 5516121 ms, flushing membership messages.
May xx xx:25:53 standby-guest corosync[6311]:  [TOTEM ] Process pause detected 
for 5516171 ms, flushing membership messages.
May xx xx:25:53 standby-guest corosync[6311]:  [TOTEM ] Process pause detected 
for 5516221 ms, flushing membership messages.
May xx xx:25:53 standby-guest corosync[6311]:  [TOTEM ] Process pause detected 
for 5516271 ms, flushing membership messages.
May xx xx:25:53 standby-guest corosync[6311]:  [TOTEM ] Process pause detected 
for 5516322 ms, flushing membership messages.
May xx xx:25:53 standby-guest corosync[6311]:  [TOTEM ] Process pause detected 
for 5516372 ms, flushing membership messages.
(snip)
May xx xx:26:03 standby-guest corosync[6311]:  [TOTEM ] Process pause detected 
for 5526172 ms, flushing membership messages.
May xx xx:26:03 standby-guest corosync[6311]:  [MAIN  ] Totem is unable to form 
a cluster because of an operating system or network fault. The most common 
cause of this message is that the local firewall is configured improperly.
May xx xx:26:03 standby-guest corosync[6311]:  [TOTEM ] Process pause detected 
for 5526222 ms, flushing membership messages.
(snip)



This is weird. Not because of enormous pause length but because corosync 
has a "scheduler pause" detector which warns before "Process pause 
detected ..." error is logged.



As a result, the standby guest failed in the construction of the independent 
cluster.

It is recorded in log as if a timer stopped for 91 minutes.
It is abnormal length for 91 minutes.

Did you see a similar problem?


Never



Possibly I think whether it is libqb or Kernel or some kind of problems.


What virtualization technology are you using? KVM?


* I suspect that the set of the timer failed in reset_pause_timeout().


You can try to put asserts into this function, but there is really not 
too much reasons why it should fail (ether malloc returns NULL or some 
nasty memory corruption).


Regards,
  Honza



Best Regards,
Hideo Yamauchi.


___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org




___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org