Hi Simon, > On 19 Nov 2012, at 07:15, Jakub Moscicki wrote: >> Thanks for this analysis. The increased UDP works pretty well for us at CERN >> so far - albeit one limit gone other limits appear more pronounced. > > I'm interested in what other limits you are hitting. I'm very aware of the > problems with the listener thread load and scheduling, are you hitting any > other problems with RX, or are they fileserver limitations?
Actually both. It's hard to fill up the 10GE network with RX for some streaming use-cases from multiple underlying drives (not a major limitation for us yet but might be useful for e.g. volserver volume moves). Conversly, with UDP packet loss giving better performance, it is much easier to saturate an underlying individual drive in case of many clients of a single user hammering one volume (batch jobs essentially). Two possible paths: more efficient caching (with the SSD layer via device mapper or with just more RAM for buffers) and smarter throttling in the fileserver (for example, scheduling of worker threads to take into account underlying I/O limitations and/or according to QoS to be provided by the fileserver [e.g. home directory fileserver geared towards interactive use versus workspace fileserver geared towards batch jobs]). Both have many open questions. Will keep you posted. > >>> management packets. 16Mbytes should be plenty providing that you don't >>> >>> d) Have a large number of 1.6.0 clients on your network >> >> Do you mean 1.6.0 (referring to a specific bug in 1.6.0) or 1.6.x (referring >> to some general change in client behaviour in 1.6 series)? > > Specificaly 1.6.0, and prereleases. There is a truly unfortunate bug in those > clients which causes them to create a gradually escalating ping flood against > every fileserver they contact. At its worst, this creates a distributed > denial of service attack against your fileservers. In terms of this > discussion, the large number of incoming RX version packets can overwhelm the > listener thread. As these packets are not flow controlled, they can force > "real" data packets out of the UDP buffer. One solution to this problem is to > drop these packets at the kernel firewall - you want to drop all version > packets with an RX epoch of 999. Good to know! Can this be done with just standard iptables? kuba -- _______________________________________________ OpenAFS-info mailing list [email protected] https://lists.openafs.org/mailman/listinfo/openafs-info
