On 07/05/2011 11:08 AM, Vladislav Bogdanov wrote:
> 05.07.2011 20:25, Steven Dake wrote:
>> On 07/05/2011 10:08 AM, Vladislav Bogdanov wrote:
>>> 05.07.2011 19:10, Steven Dake wrote:
>>>> On 07/05/2011 07:26 AM, Vladislav Bogdanov wrote:
>>>>> Hi all,
>>>>>
>>>>> Last days I see following messages in logs:
>>>>> [TOTEM ] Process pause detected for XXX ms, flushing membership messages.
>>>>>
>>>>> After that ring is quickly re-established.
>>>>> DLM/clvmd notifies this and switches to kern_stop waiting for fencing to
>>>>> be done. Although what dlm_tool ls provides is really strange flags and
>>>>> members differ between nodes. I have dumps of what has been happening in
>>>>> dlm, and there are messages that fencing was done!
>>>>>
>>>>> On the other hand, pacemaker does not notify anything so fencing is not
>>>>> done. This is rather strange, but for another list.
>>>>>
>>>>> Can anybody please explain what exactly that message means and what is
>>>>> the correct reaction of upper services should be?
>>>>> Can it be solely caused by network problems?
>>>>> Can number of buffers in RX ring of ethernet card influence this (I did
>>>>> some tuning there some time ago)?
>>>>>
>>>>> corosync 1.3.1, UDPU transport.
>>>>> pacemaker-1.1-devel
>>>>> dlm_controld.pcmk from 3.0.17
>>>>> clvmd 2.02.85
>>>>> clusterlib-3.1.1
>>>>>
>>>>
>>>> This indicates the kernel has paused scheduling (or corosync of corosync
>>>> or corosync has blocked for the time value printed in the message.
>>>
>>> I suspected this, thanks for clarification.
>>>
>>>> Corosync is non-blocking.
>>>>
>>>> Are you running inside a VM?  Increasing token is probably a necessity
>>>> when running inside a VM on a heavily loaded host because kvm does not
>>>> schedule as fairly as bare metal.
>>>>
>>>> Please provide feedback if this is bare metal or m.
>>>
>>> I see this both on one node in VM, and on bare metal hosts under high
>>> load (30 vms are installing on each 12-core node, so CPU usage is quite
>>> big).
>>>
>>> I removed eth RX ring buffer tuning from physical hosts (now it is
>>> default 256 instead of max 4096).
>>> Will see what will happen.
>>>
>>> This could be a problem of ethernet driver on bare metal nodes as well.
>>
>> Which ethernet driver?
> 
> igb-2.1.0-k2 from fc13's 2.6.34.9-69.
> 
>>
>>>
>>> With VM I'll try to increase its weight by cgroups.
>>>
>>> Steve, can you please also explain why I'm unable to move corosync to
>>> another (non-default) CPU cgroup? Is this caused by a real-time
>>> priority? I just wanted to increase its weight.
>>>
>>
>> Not sure on cgroups question, but it should be running ahead of other
>> processes assuming cgroups follow posix scheduler semantics.  You could
>> try with corosync -p (run without realtime priority) and see if cgroups
>> can be manipulated that way.
> 
> OK, will experiment, thanks.
>>
>> If you are running really heavy load, running a preemptible kernel
>> config may be useful (if that is not already default).
> 
> That is fc13, and it has PREEMPT_VOLUNTARY choosen. It should be mostly
> enough. I'll try PREEMPT if don't find any other reason to fail.
> 
>>
>> The kernel has changed so much since 5 years ago when I worked on it
>> daily I have no idea how the scheduler actually works any longer.
>>
> 
> 
> Thank you very much for your help,
> Vladislav
Vlaislav,

I checked the archives and found a patch from some time ago that was
never merged.  It wasn't verified to resolve the "pause timeout" problem
but t could indeed solve the problem.  It wasn't merged because we
lacked verification it resolved the problem.

If you could give it a spin and let us know, it should hit the mailing
lists soon.

Regards
-stee
_______________________________________________
Openais mailing list
[email protected]
https://lists.linux-foundation.org/mailman/listinfo/openais

Reply via email to