Just to see if I'm noticing the same issue, what was the exact problem
Phil was noticing? Shouldn't multiple requests take longer than a
single request?
The workload I was using was multiple rpc.nfsd threads issuing 64 KB
requests (through the writev/readv interface) to the PVFS2 kernel module
(and then to client-core and so on). To make things easy, I bet using
iozone with multiple threads and a random workload would simulate this
workload quite well. What I was noticing is that although we haven't
reached disk, cpu, or network limits, the I/O throughput is fixed at
some low value.
One test Sam and I tried was to increase the number of kernel mmapped
buffers. Instead of five 4MB buffers, we used sixty-four 128KB
buffers. This reduced performance considerably, especially read
performance. Since we are using 64KB requests, this should not be an
issue, but it was. One thing we didn't get a chance to try was if the
reduced performance was because of the increase in buffers or the
reduction in size. My guess would be the increase, but why would this be?
Beyond inefficient coding issues, Sam and I talked about where the
bottleneck could be from a design standpoint. We came up with the
following list:
0) kmapping and copying data is going at fast as possible
1) Sending message through the pvfs2-req device can only happen at a
constant rate.
2) client-core reading message off the pvfs2-req device (should no
longer be an issue with the --threaded option, but maybe reading 5 at a
time is still inefficient)
3) A single BMI thread issuing I/O requests. Are multiple threads
necessary to issue the multiple I/O requests from the kernel?
Can anyone think of other parts of the I/O path that might be a
bottleneck? So far, we have only started to investigate items 1 and 2.
Thanks for everyone's help.
Dean
To follow up on your question of what I saw, see the following mailing
list post. This was back in June but I haven't had an opportunity to
really look at it closer yet:
http://www.beowulf-underground.org/pipermail/pvfs2-developers/2006-June/002208.html
At any rate, I was running a benchmark with 5 processes on a single
node. I found that I got a significant performance improvement by
limiting the kernel module to only 1 transfer buffer rather than the
default of 5.
If you are seeing the same issue that I was, then it seems to indicate
that the number of buffers that you are using is causing additional
slowdown rather than the size of the buffers. I have no idea why. It
may be a direct problem with the mechanism that handles the buffers, or
it may be an indirect result elsewhere that only shows up when we get
concurrent I/O operations in flight from the VFS.
-Phil
_______________________________________________
Pvfs2-developers mailing list
[email protected]
http://www.beowulf-underground.org/mailman/listinfo/pvfs2-developers