Apologies for another post: The subject should read SMALL I/O. I am
apparently in a subconsciously upbeat mood.
On Sep 27, 2009, at 1:19 AM, Milo wrote:
Hi, guys. Milo from CMU here.
I'm looking into small I/O performance on PVFS2. It's actually part
of a larger project investigating possible improvements to the
performance of cloud computing software, and we're using PVFS2 as a
kind of upper bound for performance (e.g. writing a flat file on a
parallel filesystem as opposed to updating data in an HBase table).
One barrier I've encountered is the small I/O nature of many of
these Cloud Workloads. For example, the one we're looking at
currently does 1 KB I/O requests even when performing sequential
writes to generate a file.
On large I/O requests, I've managed to tweak PVFS2 to get close to
the performance of the underlying filesystem (115 MB/s or so). But
on small I/O requests performance is much lower. It seems I can only
performance approximately 5,000 I/O operations/second even when
sequentially writing testing on a single node server (4.7 MB/s with
1KB sequential writes. 19.0 MB/s with 4KB sequential writes). The
filesystem system is mounted through the PVFS2 kernel mod. This
seems similar to the Bonnie++ rates in ftp://info.mcs.anl.gov/pub/tech_reports/reports/P1010.pdf
None of this is unexpected to me and I'm happy with PVFS2's large I/
O performance. But I'd like to get a better handle on where this
bottleneck is coming from, codewise (and how I could fix it if I
find coding time between research). Here's some experimentation I've
done so far:
1) A small pair of C client/servert programs that open and close TCP
connections in a tight loop, pinging each other with a small of data
('Hello World'). I see about 10,000 connections/second with this
approach. So if each small I/O is opening and closing two TCP
connections, this could be the bottleneck. I haven't yet dug into
the pvfs2-client code and the library to see if it reuses TCP
connections or makes new ones on each request (that's deeper into
the flow code than I remember. =;) )
2) I can write to the underlying filesystem with 1 KB sequential
writes almost as quickly as with 1 MB writes. So it's not the
underlying ext3.
3) The IO ops/s bottleneck is there even with the null-aio
TroveMethod, so I doubt it's Trove.
4) atime is getting updated with null-aio, so a MetaData barrier is
possible.
Some configuration information about the filesystem:
* version 2.8.1
* The strip_size is 4194304. Not that this should matter a great
deal with one server.
* FlowBufferSizeBytes 4194304
* TroveSyncMeta and TroveSyncData are set to no
* I've applied the patch from http://www.pvfs.org/fisheye/rdiff/PVFS?csid=MAIN:slang:20090421161045&u&N
to be sure metadata syncing really is off, though I'm not sure how
to check. =:)
Thanks.
~Milo
PS: Should I send this to the pvfs2-developers list instead?
Apologies if I've used the wrong venue.
_______________________________________________
Pvfs2-users mailing list
[email protected]
http://www.beowulf-underground.org/mailman/listinfo/pvfs2-users
_______________________________________________
Pvfs2-users mailing list
[email protected]
http://www.beowulf-underground.org/mailman/listinfo/pvfs2-users