Smile I/O. I might have to steal that tag-line. :-)
On Sep 27, 2009, at 12:21 AM, Milo wrote:
Apologies for another post: The subject should read SMALL I/O. I am
apparently in a subconsciously upbeat mood.
On Sep 27, 2009, at 1:19 AM, Milo wrote:
Hi, guys. Milo from CMU here.
I'm looking into small I/O performance on PVFS2. It's actually part
of a larger project investigating possible improvements to the
performance of cloud computing software, and we're using PVFS2 as a
kind of upper bound for performance (e.g. writing a flat file on a
parallel filesystem as opposed to updating data in an HBase table).
One barrier I've encountered is the small I/O nature of many of
these Cloud Workloads. For example, the one we're looking at
currently does 1 KB I/O requests even when performing sequential
writes to generate a file.
On large I/O requests, I've managed to tweak PVFS2 to get close to
the performance of the underlying filesystem (115 MB/s or so). But
on small I/O requests performance is much lower. It seems I can
only performance approximately 5,000 I/O operations/second even
when sequentially writing testing on a single node server (4.7 MB/s
with 1KB sequential writes. 19.0 MB/s with 4KB sequential writes).
The filesystem system is mounted through the PVFS2 kernel mod. This
seems similar to the Bonnie++ rates in ftp://info.mcs.anl.gov/pub/tech_reports/reports/P1010.pdf
None of this is unexpected to me and I'm happy with PVFS2's large I/
O performance. But I'd like to get a better handle on where this
bottleneck is coming from, codewise (and how I could fix it if I
find coding time between research). Here's some experimentation
I've done so far:
1) A small pair of C client/servert programs that open and close
TCP connections in a tight loop, pinging each other with a small of
data ('Hello World'). I see about 10,000 connections/second with
this approach. So if each small I/O is opening and closing two TCP
connections, this could be the bottleneck. I haven't yet dug into
the pvfs2-client code and the library to see if it reuses TCP
connections or makes new ones on each request (that's deeper into
the flow code than I remember. =;) )
2) I can write to the underlying filesystem with 1 KB sequential
writes almost as quickly as with 1 MB writes. So it's not the
underlying ext3.
3) The IO ops/s bottleneck is there even with the null-aio
TroveMethod, so I doubt it's Trove.
4) atime is getting updated with null-aio, so a MetaData barrier is
possible.
The size of the file gets updated with null-aio as well.
Some configuration information about the filesystem:
* version 2.8.1
* The strip_size is 4194304. Not that this should matter a great
deal with one server.
* FlowBufferSizeBytes 4194304
* TroveSyncMeta and TroveSyncData are set to no
* I've applied the patch from http://www.pvfs.org/fisheye/rdiff/PVFS?csid=MAIN:slang:20090421161045&u&N
to be sure metadata syncing really is off, though I'm not sure how
to check. =:)
Other than seeing a big performance difference with/without the patch,
there's not a good way to do that. You could strace the server and
see how many syncs are being performed, but that's not ideal. Getting
an idea of the performance without the kernel interface (as Rob
mentioned) would help narrow down the problem, but these are IOzone
runs right? Using the mpi-io-test program with the pvfs romio driver
might be the easiest way to perform a similar test of small I/Os
without the kernel module in the loop.
Thanks.
~Milo
PS: Should I send this to the pvfs2-developers list instead?
Apologies if I've used the wrong venue.
Users is the right place. When you send patches for all the
performance improvements you're making, those can go to developers. ;-)
-sam
_______________________________________________
Pvfs2-users mailing list
[email protected]
http://www.beowulf-underground.org/mailman/listinfo/pvfs2-users
_______________________________________________
Pvfs2-users mailing list
[email protected]
http://www.beowulf-underground.org/mailman/listinfo/pvfs2-users
_______________________________________________
Pvfs2-users mailing list
[email protected]
http://www.beowulf-underground.org/mailman/listinfo/pvfs2-users