Hi Honza,
Thank you for comment.
>> Our user constituted a cluster in corosync and Pacemaker in the next
> environment.
>> The cluster constituted it among guests.
>>
>> * Host/Guest : RHEL6.6 - kernel : 2.6.32-504.el6.x86_64
>> * libqb 0.17.1
>> * corosync 2.3.4
>> * Pacemaker 1.1.12
>>
>> The cluster worked well.
>> When a user stopped an active guest, the next log was output in standby
> guests repeatedly.
>
> What exactly you mean by "active guest" and "standby
> guests"?
The cluster is active / standby constitution.
As for the standby guest, a wait is in a state until a resource breaks down in
active guests.
When a resource was replaced by standby, this problem seemed to occur.
>
>>
>> May xx xx:25:53 standby-guest corosync[6311]: [TOTEM ] Process pause
> detected for 5515870 ms, flushing membership messages.
>> May xx xx:25:53 standby-guest corosync[6311]: [TOTEM ] Process pause
> detected for 5515920 ms, flushing membership messages.
>> May xx xx:25:53 standby-guest corosync[6311]: [TOTEM ] Process pause
> detected for 5515971 ms, flushing membership messages.
>> May xx xx:25:53 standby-guest corosync[6311]: [TOTEM ] Process pause
> detected for 5516021 ms, flushing membership messages.
>> May xx xx:25:53 standby-guest corosync[6311]: [TOTEM ] Process pause
> detected for 5516071 ms, flushing membership messages.
>> May xx xx:25:53 standby-guest corosync[6311]: [TOTEM ] Process pause
> detected for 5516121 ms, flushing membership messages.
>> May xx xx:25:53 standby-guest corosync[6311]: [TOTEM ] Process pause
> detected for 5516171 ms, flushing membership messages.
>> May xx xx:25:53 standby-guest corosync[6311]: [TOTEM ] Process pause
> detected for 5516221 ms, flushing membership messages.
>> May xx xx:25:53 standby-guest corosync[6311]: [TOTEM ] Process pause
> detected for 5516271 ms, flushing membership messages.
>> May xx xx:25:53 standby-guest corosync[6311]: [TOTEM ] Process pause
> detected for 5516322 ms, flushing membership messages.
>> May xx xx:25:53 standby-guest corosync[6311]: [TOTEM ] Process pause
> detected for 5516372 ms, flushing membership messages.
>> (snip)
>> May xx xx:26:03 standby-guest corosync[6311]: [TOTEM ] Process pause
> detected for 5526172 ms, flushing membership messages.
>> May xx xx:26:03 standby-guest corosync[6311]: [MAIN ] Totem is unable to
> form a cluster because of an operating system or network fault. The most
> common
> cause of this message is that the local firewall is configured improperly.
>> May xx xx:26:03 standby-guest corosync[6311]: [TOTEM ] Process pause
> detected for 5526222 ms, flushing membership messages.
>> (snip)
>>
>
> This is weird. Not because of enormous pause length but because corosync
> has a "scheduler pause" detector which warns before "Process
> pause
> detected ..." error is logged.
I thought so, too.
However, "scheduler pause" does not seem to be taking place.
>
>> As a result, the standby guest failed in the construction of the
> independent cluster.
>>
>> It is recorded in log as if a timer stopped for 91 minutes.
>> It is abnormal length for 91 minutes.
>>
>> Did you see a similar problem?
>
> Never
Okay!
>
>>
>> Possibly I think whether it is libqb or Kernel or some kind of problems.
>
> What virtualization technology are you using? KVM?
>
>> * I suspect that the set of the timer failed in reset_pause_timeout().
>
> You can try to put asserts into this function, but there is really not
> too much reasons why it should fail (ether malloc returns NULL or some
> nasty memory corruption).
I read a source code, too.
However, it is the street of your opinion.
I do not know whether a problem reappears, but I constitute it in RHEL6.6 and
intend to take load this week.
If any you have noticed, please give me an email.
Best Regards,
Hideo Yamauchi.
___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users
Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org