On Jul 23, 2009, at 12:59 PM, Milo wrote:
Hi, guys. I'm getting surprisingly poor performance on my PVFS2
cluster. Here's the setup:
*) 13 nodes running PVFS2 2.8.1 with Linux Kernel 2.6.28-13 server,
each with a 15 drive RAID-5 array.
*) The RAID-5 array gets 60 MB/s local write speeds with XFS
according to iozone (writing in 4 MB records)
Doing a component test of the device with local file system is a good
idea, but you should see what you get with 1 MB records, since that's
what you will be doing with PVFS (using a 1 MB flow buffer size). I
wouldn't expect a difference, but just for consistency... Also,
iozone doesn't normally include syncs with the timing, so you may see
caching effects in that number. You would see them with PVFS too
since you disabled TroveDataSync, but if the I/O is larger in the PVFS
case you'll start to see actual disk I/O when you run out of memory.
How big are your iozone runs?
I actually tend to use sgdd instead of iozone when doing a local
filesystem test like this, because it allows me specify exactly when/
if to sync, or use O_DIRECT, block size, etc. That tells me exactly
which mode works best for my server/hardware setup, so that I can
choose the right parameters for PVFS.
Also, a component test of the network with iperf or netpipe or
something would show you what to expect there. PVFS includes a BMI
pingpong test in the test suite which shows point to point performance
too.
I'd like to get at least 50 MB/s/server from the cluster and I've
been testing this with a single PVFS2 server and client with the
client running either on the same node or a node on the same switch
(it doesn't seem to make a lot of difference). The server is
configured with Trove syncing off, a 4 MB strip size simple_strip
distribution, and a 1 MB FlowBufferSizeBytes. Results have been as
follow:
With TroveMethod alt-aio or default, I'm getting around 15 MB/s when
transferring a 3 GB file through pvfs2-cp:
r...@ss239:~# pvfs2-cp -t ./file.3g /mnt/pvfs2/out
Wrote 2867200000 bytes in 192.811478 seconds. 14.181599 MB/seconds
dd'ing a similar file through pvfs2fuse gets about a third of that
performance, 5 MB/s:
r...@ss239:~# dd if=/dev/zero of=/mnt/pvfs2fuse/out bs=1024K
count=1024
1024+0 records in
1024+0 records out
1073741824 bytes (1.1 GB) copied, 206.964 s, 5.2 MB/s
I get similar results using iozone writing through the fuse client.
If I switch the method to null-aio, things speed up a lot, but it's
still suspiciously slow:
r...@ss239:~# pvfs2-cp -t ./file.out /mnt/pvfs2fuse/out7-nullaio
Wrote 2867200000 bytes in 60.815127 seconds. 44.962086 MB/seconds
r...@ss239:~# dd if=/dev/zero of=/mnt/pvfs2fuse/out-nullaio
bs=1024K count=1024
1024+0 records in
1024+0 records out
1073741824 bytes (1.1 GB) copied, 21.0201 s, 51.1 MB/s
You're only going from one client so you're limited to single-link
bandwidth, which isn't going to be more than ~120 MB/s I would guess
(GigE right?). So you're getting a little over a third of that, which
still isn't good. This isn't incast because you're doing writes, but
one thing you can try is looking at doing writes to fewer servers.
You can specify a stripe width on a per file basis, so that you can
look at performance at 2, 4, and 8 servers. You can use setfattr:
setfattr -n "user.pvfs2.num_dfiles" -v "8" dir
Or just use pvfs2-xattr if you don't have the kernel module.
-sam
I suspect there's some network bottleneck. I'm going to try to
adjust the MTU as Jim just did. But are there any other
configuration options I should look into?
Thanks.
~Milo
_______________________________________________
Pvfs2-users mailing list
[email protected]
http://www.beowulf-underground.org/mailman/listinfo/pvfs2-users
_______________________________________________
Pvfs2-users mailing list
[email protected]
http://www.beowulf-underground.org/mailman/listinfo/pvfs2-users