Hi Simon,

> On 19 Nov 2012, at 07:15, Jakub Moscicki wrote:
>> Thanks for this analysis. The increased UDP works pretty well for us at CERN 
>> so far - albeit one limit gone other limits appear more pronounced.
> 
> I'm interested in what other limits you are hitting. I'm very aware of the 
> problems with the listener thread load and scheduling, are you hitting any 
> other problems with RX, or are they fileserver limitations?

Actually both. It's hard to fill up the 10GE network with RX for some streaming 
use-cases from multiple underlying drives (not a major limitation for us yet 
but might be useful for e.g. volserver volume moves). Conversly, with UDP 
packet loss giving better performance, it is much easier to saturate an 
underlying individual drive in case of many clients of a single user hammering 
one volume (batch jobs essentially). Two possible paths: more efficient caching 
(with the SSD layer via device mapper or with just more RAM for buffers) and 
smarter throttling in the fileserver (for example, scheduling of worker threads 
to take into account underlying I/O limitations and/or according to QoS to be 
provided by the fileserver [e.g. home directory fileserver geared towards 
interactive use versus workspace fileserver geared towards batch jobs]).  Both 
have many open questions. Will keep you posted.


> 
>>> management packets. 16Mbytes should be plenty providing that you don't
>>> 
>>> d) Have a large number of 1.6.0 clients on your network
>> 
>> Do you mean 1.6.0 (referring to a specific bug in 1.6.0) or 1.6.x (referring 
>> to some general change in client behaviour in 1.6 series)? 
> 
> Specificaly 1.6.0, and prereleases. There is a truly unfortunate bug in those 
> clients which causes them to create a gradually escalating ping flood against 
> every fileserver they contact. At its worst, this creates a distributed 
> denial of service attack against your fileservers. In terms of this 
> discussion, the large number of incoming RX version packets can overwhelm the 
> listener thread. As these packets are not flow controlled, they can force 
> "real" data packets out of the UDP buffer. One solution to this problem is to 
> drop these packets at the kernel firewall - you want to drop all version 
> packets with an RX epoch of 999.

Good to know! Can this be done with just standard iptables?

kuba

--

_______________________________________________
OpenAFS-info mailing list
[email protected]
https://lists.openafs.org/mailman/listinfo/openafs-info

Reply via email to