Dear AFS Gurus,
At CERN we have been suffering from occasional problems where the time to 
access any volume/partition on a busy AFS server increased suddenly to 
~infinity (or at least 20 seconds). During these incidents we consistently 
notice:
 - at least one user is hammering the fileserver (from 10s or 100s of batch 
jobs)
 - time to write 64kB to any volume on the affected server went from ~10ms as 
usual to >10 or 20 seconds
 - network throughput is "flat" for the duration of the incident, but well 
below the historical peak throughput -- sometimes at ~50MBps or up to ~150MBps 
(server has a 10Gbps network card)
 - CPU usage is also flat at ~120% (corresponding to 1 processor + a bit)
 - iostat shows little or no disk activity
 - there is not a shortage of threads (more than 100 idle threads).

We are indeed able to reproduce the issue in a synthetic stress test 
environment (using both v1.4.14+CERN patches and also 1.6.1a vanilla code). 
With the 1.6.1a vanilla fileserver, we can hit this access time wall by 
creating >=30 clients which simultaneously cp a 10GB file from AFS into 
/dev/null.

Recently, while trying to reproduce the issue with rxperf, we found that the 
issue is basically due to overrunning the UDP socket buffer, i.e. huge numbers 
of dropped UDP packets (we see >10% packet errors in /proc/net/snmp). By 
increasing the buffer size we can effectively mitigate this problem. 

We currently run fileservers with udpsize=2MB, and at that size we have a 30 
client limit in our test environment. With a buffer size=8MB (increased kernel 
max with sysctl and fileserver option), we don't see any dropped UDP packets 
during our client-reading stress test, but still get some dropped packets if 
all clients write to the server. With a 16MB buffer we don't see any dropped 
packets at all in reading or writing.

In practise with this very large UDP buffer, we can decrease the access time 
from ~infinity to less than 1second on a very heavily loaded server (e.g. 250 
clients writing 1GB each)

We plan to roll these very large UDP buffer sizes into production, but wanted 
to check here first if we have missed something. Does anyone foresee problems 
with using a 16MB UDP buffer?

By the way, we have also compared the access latency of 1.4.14 and 1.6.1a in 
our rxperf tests. In general we find that 1.6.1a provides 2-3x speedup (e.g. 
hammered 1.6.1a has a 64kB write latency of ~300ms vs ~1s for 1.4.14). So this 
confirms a significant performance improvement in 1.6.

Best Regards,
Dan van der Ster, CERN IT-DSS
on behalf of the CERN AFS Team_______________________________________________
OpenAFS-info mailing list
[email protected]
https://lists.openafs.org/mailman/listinfo/openafs-info

Reply via email to