Hi Tom! On Jun 19, 2005, at 5:03 AM, Tom Keiser wrote:
Interesting indeed. Thanks for the very detailed answer, it'll take some time for me to digest it ;-)On 6/16/05, Roland Kuhn <[EMAIL PROTECTED]> wrote:Dear experts! We are fighting with the fileserver performance since a long time. Once I got the advice to use the single threaded fileserver, which helped, but didn't get me more than 10MB/s. Now we upgraded to Debian sarge (openafs 1.3.81), which comes again with the threaded server. With default settings we get 1MB/s (the underlying RAID can easily deliver >200MB/s, which shows that the VM settings are okay). Now I tried with -L -vc 10000 -cb 100000 -udpsize 12800, which brings it back to about 6MB/s (all numbers with >>1 simultanous clients reading). This is still a factor 30 below the capabilities of the RAID (okay, we only have 1GB/s ethernet ;-) ). I've seen excessive context switch rates (>>100000/s), which obviously don't happen with the single threaded fileserver. So, can anybody comment on these numbers? Those are dual Opteron boxes with enough RAM, so please make some suggestions what options I should try to get more like the real performance of a fileserver...This is very interesting. On much much older hardware (2x 300MHz sun e450 running solaris 10) I can get >15MB/s aggregate off a single fc-al disk with >> 1 clients over gigE with absolutely no tweaking of fileserver parameters. Of course, there are many performance bottlenecks in multithreading that are actually exacerbated by faster cpu's, so the results you're seeing are plausible.
Right now I'm using the single threaded fileserver with -L -vc 10000 -udpsize 128000 -rxpck 400 -busyat 200 -cb 100000and I'm getting 11MB/s using 8 slow and 3 fast clients. The workload is sequential reading of 1.5GB big files (thousands of those).
We're not using amd64 yet, still on i386 Debian. But I can easily compile the 1.3.84 fileserver, unless there are problems to be expected with 1.2.13 clients (I've not seen any so far with the 1.3.81-sarge1 fileserver).I'd be interested in seeing a comparison of 1.3.81 and 1.3.84 performance. Several threading patches were integrated between these revisions, and it would be interesting to see how they affect your problem. I know on sparc they are making a difference, but that doesn't necessarily correllate to amd64.
If you upgrade to 1.3.84, there is another fileserver option that you will want to experiment with: -rxpck. Sometime after 1.3.81, thread-local packet queues were integrated, and they may reduce your context switch rate due to less contention for the global packet queue lock. The default value for -rxpck will give you approximately 500 rx_packet structures. I recommend trying several values in the range 1000-5000. At some point, you will reach an optimal tradeoff between a small value that fits within your cache hierarchy, and a large value that reduces the number of transfers between the thread-local and global rx_packet queues. Before submitting the thread-local patch to RT, I was only able to test on a few architectures, and I'd like to get feedback for amd64.
Sorry, the upgrade to amd64 will only be done in a few months :-(
Well, I have no idea what the numbers mean, but I've attached the output of "afsmonitor -detailed -fshost <XXX>". In this case the workload was such that the computation on the client side limited the throughput to about 8MB/s. The problem is that this fileserver is very much used from our cluster and I cannot easily introduce downtime or change the workload.Another option you might care to experiment with is: -p . IIRC, the default will give you 12 worker threads. It sounds like many of your worker threads are busy handling calls, but are constantly contending over locks, and blocking on i/o. You will need to experiment with this, but you may find that reducing the number of worker threads will actually improve performance by forcing new calls to queue up, thereby allowing your active calls to complete with less contention. Of course, this won't alleviate the problems caused by blocking i/o. Reducing this value too far is dangerous because some calls have high latencies (e.g. some calls make calls to the ptserver). Have you looked at the xstat results from your servers? afsmonitor is a great little tool, and it can even dump these results periodically to a log. This data could help us to understand your workload. Seeing those numbers would also help us with suggesting changes to parameters in the volume package.
Thanks for all your help!
Ciao,
Roland
afs.log
Description: Binary data
-- TU Muenchen, Physik-Department E18, James-Franck-Str. 85747 Garching Telefon 089/289-12592; Telefax 089/289-12570 -- A mouse is a device used to point at the xterm you want to type in. Kim Alm on a.s.r. -----BEGIN GEEK CODE BLOCK----- Version: 3.12GS/CS/M/MU d-(++) s:+ a-> C+++ UL++++ P-(+) L+++ E(+) W+ !N K- w--- M + !V Y+
PGP++ t+(++) 5 R+ tv-- b+ DI++ e+++>++++ h---- y+++ ------END GEEK CODE BLOCK------
PGP.sig
Description: This is a digitally signed message part
