Nobody really got back to me on this, but I've switched over to the Open Fabrics stack due to some other issues I was facing with the Topspin stack. So I'll give it another go against Open Fabrics and let the list know how that went. If it goes bad, I'll try ipoib.
These are all Dell 1950 x86_64 Woodcrest boxes. Disk space should not have been an issue. Each node has about 100G of free disk. Thanks Tim -----Original Message----- From: Murali Vilayannur [mailto:[EMAIL PROTECTED] Sent: Thursday, April 05, 2007 4:31 AM To: Carlson, Timothy S Cc: [email protected] Subject: Re: [Pvfs2-users] IOR errors Hi Tim, I don't know if anyone responded to this email or if it got lost.. You could try a couple of things and also provide some more information, - Are these Opteron/x86_64 boxes? - Can you try this out on tcp if possible instead of ib? That will help us rule out any IB specific oddities? - writes may have hit ENOSPC on one or more servers.. Would it be possible to check the amt of available disk space on all the servers? I will try to reproduce this on a much smaller run although I doubt if anything would show up since the nightlies would have got those.. Sorry for not being able to help better.. Thanks, Murali On 3/21/07, Carlson, Timothy S <[EMAIL PROTECTED]> wrote: > Thanks to the folks who helped me out yesterday I got a nice little > 2.3T > pvfs2 (2.6.2) file system. I have 16 nodes that are all acting as I/O > servers and clients. 1 of those boxes is also the meta data server. > All over Topspin IB and I am using all the default setting in my > config file parameters. > > That being said, I wanted to test the bandwidth so I compiled the > POSIX version of IOR against the Topspin mpich libraries. > > My run looks like this. > > IOR-2.9.4: MPI Coordinated Test of Parallel I/O > > Run began: Wed Mar 21 16:06:04 2007 > Command line used: /home/tim/IOR -i 8 -b 1024m -o > /mnt/pvfs2/ior/ior_16g > Machine: Linux compute-0-15.local > > Summary: > api = POSIX > test filename = /mnt/pvfs2/ior/ior_16g > access = single-shared-file > clients = 16 (1 per node) > repetitions = 8 > xfersize = 262144 bytes > blocksize = 1 GiB > aggregate filesize = 16 GiB > > access bw(MiB/s) block(KiB) xfer(KiB) open(s) wr/rd(s) > close(s) iter > ------ --------- ---------- --------- -------- -------- > -------- ---- > write 613.70 1048576 256.00 0.177541 26.43 7.24 > 0 > read 1141.20 1048576 256.00 0.019199 14.34 > 0.329994 0 > write 589.05 1048576 256.00 0.154706 27.74 7.06 > 1 > read 1032.93 1048576 256.00 0.019723 15.84 > 0.417178 1 > write 550.66 1048576 256.00 0.991332 29.58 8.43 > 2 > read 1005.48 1048576 256.00 0.021340 16.28 > 0.448091 2 > write 555.06 1048576 256.00 0.232900 29.48 8.57 > 3 > read 1006.24 1048576 256.00 0.018788 16.27 > 0.263041 3 > WARNING: Expected aggregate file size = 17179869184. > WARNING: Stat() of aggregate file size = 13958643712. > WARNING: Using actual aggregate bytes moved = 17179869184. > write 438.87 1048576 256.00 0.238877 37.23 15.80 > 4 > ** error ** > ERROR in aiori-POSIX.c (line 245): hit EOF prematurely. > ERROR: Success > ** exiting ** > ** error ** > ERROR in aiori-POSIX.c (line 245): hit EOF prematurely. > > > I would say that the performance is quite good until I get to those > errors. Nothing interesting in the client or server logs. Something in > my IOR setup that might be stressing things a bit too hard? > > Thanks for any insights. > > Tim > > > Tim Carlson > Voice: (509) 376 3423 > Email: [EMAIL PROTECTED] > Pacific Northwest National Laboratory > HPCaNS: High Performance Computing and Networking Services > > _______________________________________________ > Pvfs2-users mailing list > [email protected] > http://www.beowulf-underground.org/mailman/listinfo/pvfs2-users > _______________________________________________ Pvfs2-users mailing list [email protected] http://www.beowulf-underground.org/mailman/listinfo/pvfs2-users
