On 07/08/2011 02:03 AM, Vladislav Bogdanov wrote:
>>> I checked the archives and found a patch from some time ago that was
>>> never merged.  It wasn't verified to resolve the "pause timeout" problem
>>> but t could indeed solve the problem.  It wasn't merged because we
>>> lacked verification it resolved the problem.
>>
>> Great, I'll try it in next few days, good news is that problem should be
>> easily reproducible.
> 
> Hmm...
> Not so easily...
> 
> I applied that patch to all physical hosts, and do not see that message
> any more for two days, independently of number of RX buffers in adapter.
> 
> But, I do not see it if I downgrade to previous image (without that
> patch) :( Although I did not test it again for a long time, only several
> hours.
> 
> I didn't apply patch to VM, and do not see that message either.
> What I did also:
> * Rescheduled VM to higher CPU priority (actually real-time)
> * Assigned higher blkio priority to that VM
> * Assigned low blkio priority to bulk resources on node where that VM runs.
> So, original problem seems to have different causes for bare-metal and
> VM cases.
> 
> For former case patch seems to be helpful.
> It should help for VM case too.
> 
> There were lots of '[TOTEM ] Retransmit List:' messages on bare-metal
> hosts until I returned eth RX ring size back to 256 buffers (from 4096).
> After some thinking, this is probably correct, because more buffers add
> some latency, which is bad for corosync. Not sure why that may affect
> NAPI polling rate although.
> 
> I'll try to upgrade igb driver (newer version has tuning param
> InterruptThrottleRate) and play again with ring buffers and that rate.
> 
> Again, that driver version I currently have may have some bugs when
> operating with big buffer rings which lead to 500ms blocking under high
> load.
> 
> BTW are that Retransmit List: messages harmful?
> 

These are only warning messages and result in a duplicate message being
retransmitted which may not have to be.  We are working to sort out how
to remove these on some hardware enironments.

Regards
-steve

> 
> Best,
> Vladislav
> _______________________________________________
> Openais mailing list
> [email protected]
> https://lists.linux-foundation.org/mailman/listinfo/openais

_______________________________________________
Openais mailing list
[email protected]
https://lists.linux-foundation.org/mailman/listinfo/openais

Reply via email to