[EMAIL PROTECTED] wrote on Fri, 29 Dec 2006 10:43 -0500:
> What performance do you typically see with a single client and single  
> server (not the same machine) with 10 Gb/s NICs?

1 metaserver, 1 io server, 1 client, 16 MB flow buffer sizes.  Here's
some similarly uninteresting numbers on IB, with the server running
maybe around 50%.  (Only 800 MB else I fall into swap.):

ib30$ pvfs2-cp -t /tmp/tmpfs/800m /pvfs-ib/x1
Wrote 838860800 bytes in 3.035549 seconds. 263.543767 MB/seconds

ib30$ pvfs2-cp -t -b $((64*1024*1024)) /tmp/tmpfs/800m /pvfs-ib/x1
Wrote 838860800 bytes in 2.237504 seconds. 357.541259 MB/seconds

pvfs2-cp isn't that great a code.  Find yourself an MPI interface
benchmark, like "perf".  This produces server load around 90%:

ib30$ mpiexec -n 1 2402/perf -n 10 -s 800m -c 100m -f pvfs2:/pvfs-ib/x1
#np size   chunk  write no sync- read no sync-- write sync---- read sync-----
#   (MB)    (MB)  (MB/s)         (MB/s)         (MB/s)         (MB/s)
1    800.0  100.0 681.56 +-  1.9 612.87 +-  2.2 679.99 +-  1.0 613.76 +-  3.0

With 1 MB flow buffers, the server is pegged and slower, more like
what you're seeing:

ib30$ mpiexec -n 1 2402/perf -n 10 -s 800m -c 100m -f pvfs2:/pvfs-ib/x1
#np size   chunk  write no sync- read no sync-- write sync---- read sync-----
#   (MB)    (MB)  (MB/s)         (MB/s)         (MB/s)         (MB/s)
1    800.0  100.0 342.73 +-  3.8 317.24 +-  2.6 343.96 +-  2.2 318.34 +-  1.8

It's important to keep the flow buffer size comparable with the
network speed.  The default 256 kB is too small even for gige.
The stripe size only comes into play with multiple IO servers, and
that wants to be large too.

> On the same machine, if I use dd to copy from /dev/zero to /mnt/tmpfs/ 
> zeros using 1 MB blocks, I get 300 MB/s for a 1 GB file.

This is wrong.  You should get 700-900 MB/s for memcpy on a recent
vintage machine.  Data in tmpfs will go to swap if you exceed the
free memory on the box.  Watch for that.

> Initially, I used the dumbest of BMI_meth_memalloc() and  
> BMI_meth_memfree(), where they are simply calls to malloc() and free 
> (), and I was getting about 300 MB/s. Thinking that this was the  
> problem, I tinkered with mallopt() to set higher thresholds for trim  
> and mmap. This added about 50 MB/s.
> 
> Next, I added pre-malloced memory on startup and I manage a list of  
> these buffers. This added another 50 MB/s to get me to 400 MB/s.

IB uses malloc/free, but caches freed blocks to avoid costly
re-registrations later, handing out an old block on future malloc
calls.  You probably don't care about registration, but we added
a hook so the IO client can tell the BMI device about the
user-supplied buffer rather than seeing lots of 64 kB buffers:
BMI_OPTIMISTIC_BUFFER_REG.

> I  
> tried playing with pvfs2-cp's -b option but performance never  
> improved over the default behavior. Interestingly, on the client,  
> pvfs2-cp only uses two 1 MB buffers (over and over) for the entire 1  
> GB transfer. Is this intentional? Does this mean, that only one  
> buffer is in flight while the other is being filled? Is there a way  
> to get pvfs2-cp to use more concurrent messages?

pvfs2-cp is not exactly optimized for performance.  Don't spend too
much time worrying about it.

                -- Pete

_______________________________________________
Pvfs2-developers mailing list
[email protected]
http://www.beowulf-underground.org/mailman/listinfo/pvfs2-developers

Reply via email to