Re: [Pvfs2-developers] threaded client-core and the device thread

Phil Carns Tue, 17 Oct 2006 09:54:00 -0700

Just to see if I'm noticing the same issue, what was the exact problemPhil was noticing? Shouldn't multiple requests take longer than asingle request?
The workload I was using was multiple rpc.nfsd threads issuing 64 KBrequests (through the writev/readv interface) to the PVFS2 kernel module(and then to client-core and so on). To make things easy, I bet usingiozone with multiple threads and a random workload would simulate thisworkload quite well. What I was noticing is that although we haven'treached disk, cpu, or network limits, the I/O throughput is fixed atsome low value.
One test Sam and I tried was to increase the number of kernel mmappedbuffers. Instead of five 4MB buffers, we used sixty-four 128KBbuffers. This reduced performance considerably, especially readperformance. Since we are using 64KB requests, this should not be anissue, but it was. One thing we didn't get a chance to try was if thereduced performance was because of the increase in buffers or thereduction in size. My guess would be the increase, but why would this be?
Beyond inefficient coding issues, Sam and I talked about where thebottleneck could be from a design standpoint. We came up with thefollowing list:
0) kmapping and copying data is going at fast as possible
1) Sending message through the pvfs2-req device can only happen at aconstant rate.2) client-core reading message off the pvfs2-req device (should nolonger be an issue with the --threaded option, but maybe reading 5 at atime is still inefficient)3) A single BMI thread issuing I/O requests. Are multiple threadsnecessary to issue the multiple I/O requests from the kernel?
Can anyone think of other parts of the I/O path that might be abottleneck? So far, we have only started to investigate items 1 and 2.
Thanks for everyone's help.
Dean

To follow up on your question of what I saw, see the following mailinglist post. This was back in June but I haven't had an opportunity toreally look at it closer yet:


http://www.beowulf-underground.org/pipermail/pvfs2-developers/2006-June/002208.html

At any rate, I was running a benchmark with 5 processes on a singlenode. I found that I got a significant performance improvement bylimiting the kernel module to only 1 transfer buffer rather than thedefault of 5.

If you are seeing the same issue that I was, then it seems to indicatethat the number of buffers that you are using is causing additionalslowdown rather than the size of the buffers. I have no idea why. Itmay be a direct problem with the mechanism that handles the buffers, orit may be an indirect result elsewhere that only shows up when we getconcurrent I/O operations in flight from the VFS.


-Phil

_______________________________________________
Pvfs2-developers mailing list
[email protected]
http://www.beowulf-underground.org/mailman/listinfo/pvfs2-developers

Re: [Pvfs2-developers] threaded client-core and the device thread

Reply via email to