Milo,

I would second Kyle's suggestion about trying get performance to an expected level for 1 client and 1 server.

Does your RAID hardware tell you what i/o request size is best for it? Then set FlowBufferSizeBytes to that as well as max_sectors_kb in the directory listed below.

  where X is the letter(s) of your device:
/sys/block/sdX/queue has parameters you can tweak to change for tuning the block device.

mounting xfs with noatime and nobarrier gives us the best performance for our SAN.

you could also try the directio trove back-end but if your RAID hardware has a high IOP count then maybe that won't help.

  you might try this for your networking...?

  /etc/sysctl.conf
net.core.rmem_max = 16777216
net.core.wmem_max = 16777216
net.core.optmem_max = 524287
net.core.netdev_max_backlog = 300000
net.ipv4.tcp_rmem = 4096 87380 16777216
net.ipv4.tcp_wmem = 4096 87380 16777216
net.ipv4.tcp_timestamps = 0
net.ipv4.tcp_sack = 1
net.ipv4.tcp_rmem = 4096 87380 16777216
net.ipv4.tcp_wmem = 4096 87380 16777216
net.ipv4.tcp_low_latency = 0
net.ipv4.tcp_window_scaling = 1
net.ipv4.tcp_moderate_rcvbuf = 1
net.ipv4.route.flush = 1
net.ipv4.tcp_rfc1337=1

  you definitely want jumbo frames if all your hardware supports it.

i found from experience that tuning is very hard. Lots of knobs to turn...

kevin






On Jul 23, 2009, at 12:59 PM, Milo wrote:

Hi, guys. I'm getting surprisingly poor performance on my PVFS2 cluster. Here's the setup:

*) 13 nodes running PVFS2 2.8.1 with Linux Kernel 2.6.28-13 server, each with a 15 drive RAID-5 array.

*) The RAID-5 array gets 60 MB/s local write speeds with XFS according to iozone (writing in 4 MB records)

I'd like to get at least 50 MB/s/server from the cluster and I've been testing this with a single PVFS2 server and client with the client running either on the same node or a node on the same switch (it doesn't seem to make a lot of difference). The server is configured with Trove syncing off, a 4 MB strip size simple_strip distribution, and a 1 MB FlowBufferSizeBytes. Results have been as follow:

With TroveMethod alt-aio or default, I'm getting around 15 MB/s when transferring a 3 GB file through pvfs2-cp:

        r...@ss239:~# pvfs2-cp -t ./file.3g /mnt/pvfs2/out
        Wrote 2867200000 bytes in 192.811478 seconds. 14.181599 MB/seconds

dd'ing a similar file through pvfs2fuse gets about a third of that performance, 5 MB/s:

r...@ss239:~# dd if=/dev/zero of=/mnt/pvfs2fuse/out bs=1024K count=1024
        1024+0 records in
        1024+0 records out
        1073741824 bytes (1.1 GB) copied, 206.964 s, 5.2 MB/s

I get similar results using iozone writing through the fuse client.

If I switch the method to null-aio, things speed up a lot, but it's still suspiciously slow:

        r...@ss239:~# pvfs2-cp -t ./file.out /mnt/pvfs2fuse/out7-nullaio
        Wrote 2867200000 bytes in 60.815127 seconds. 44.962086 MB/seconds
        
r...@ss239:~# dd if=/dev/zero of=/mnt/pvfs2fuse/out-nullaio bs=1024K count=1024
        1024+0 records in
        1024+0 records out
        1073741824 bytes (1.1 GB) copied, 21.0201 s, 51.1 MB/s

I suspect there's some network bottleneck. I'm going to try to adjust the MTU as Jim just did. But are there any other configuration options I should look into?

Thanks.

~Milo
_______________________________________________
Pvfs2-users mailing list
[email protected]
http://www.beowulf-underground.org/mailman/listinfo/pvfs2-users

_______________________________________________
Pvfs2-users mailing list
[email protected]
http://www.beowulf-underground.org/mailman/listinfo/pvfs2-users

Reply via email to