[EMAIL PROTECTED] wrote on Wed, 21 Mar 2007 16:39 -0700:
> Thanks to the folks who helped me out yesterday I got a nice little 2.3T
> pvfs2 (2.6.2) file system. I have 16 nodes that are all acting as I/O
> servers and clients. 1 of those boxes is also the meta data server. All
> over Topspin IB and I am using all the default setting in my config file
> parameters.
>
> That being said, I wanted to test the bandwidth so I compiled the POSIX
> version of IOR against the Topspin mpich libraries.
>
> My run looks like this.
>
> IOR-2.9.4: MPI Coordinated Test of Parallel I/O
>
> Run began: Wed Mar 21 16:06:04 2007
> Command line used: /home/tim/IOR -i 8 -b 1024m -o /mnt/pvfs2/ior/ior_16g
> Machine: Linux compute-0-15.local
>
> Summary:
> api = POSIX
> test filename = /mnt/pvfs2/ior/ior_16g
> access = single-shared-file
> clients = 16 (1 per node)
> repetitions = 8
> xfersize = 262144 bytes
> blocksize = 1 GiB
> aggregate filesize = 16 GiB
>
> access bw(MiB/s) block(KiB) xfer(KiB) open(s) wr/rd(s)
> close(s) iter
> ------ --------- ---------- --------- -------- --------
> -------- ----
> write 613.70 1048576 256.00 0.177541 26.43 7.24
> 0
> read 1141.20 1048576 256.00 0.019199 14.34
> 0.329994 0
> write 589.05 1048576 256.00 0.154706 27.74 7.06
> 1
> read 1032.93 1048576 256.00 0.019723 15.84
> 0.417178 1
> write 550.66 1048576 256.00 0.991332 29.58 8.43
> 2
> read 1005.48 1048576 256.00 0.021340 16.28
> 0.448091 2
> write 555.06 1048576 256.00 0.232900 29.48 8.57
> 3
> read 1006.24 1048576 256.00 0.018788 16.27
> 0.263041 3
> WARNING: Expected aggregate file size = 17179869184.
> WARNING: Stat() of aggregate file size = 13958643712.
> WARNING: Using actual aggregate bytes moved = 17179869184.
> write 438.87 1048576 256.00 0.238877 37.23 15.80
> 4
> ** error **
> ERROR in aiori-POSIX.c (line 245): hit EOF prematurely.
> ERROR: Success
> ** exiting **
> ** error **
> ERROR in aiori-POSIX.c (line 245): hit EOF prematurely.
>
>
> I would say that the performance is quite good until I get to those
> errors. Nothing interesting in the client or server logs. Something in
> my IOR setup that might be stressing things a bit too hard?
Cleaning my mailbox today. Didn't think you'd hear another reply on
this matter, did you. :)
I can repeat this with IB, both using POSIX and MPIIO. With MPIIO you will
see the error messages explicitly, something like:
[E 18:30:04.916174] fp_multiqueue_cancel: flow proto cancel called on 0x6d0100
[E 18:30:04.917227] handle_io_error: flow proto error cleanup started on
0x6d0100, error_code: -1610613121
[E 18:30:04.917247] handle_io_error: flow proto 0x6d0100 canceled 1 operations,
will clean up.
[E 18:30:04.917264] handle_io_error: flow proto 0x6d0100 error cleanup
finished, error_code: -1610613121
but with POSIX the error messages are generated by the
pvfs2-client-core kernel helper and end up in a file somewhere.
In the MPIIO case, the code completes with no errors, but the times
are pretty lousy. It is related to these timeouts in the flow
protocol. Each client allocates a certain amount of time to get a
response back from a server, and if it doesn't get one, it cancels
the operation and tries again. It could be that in the POSIX case
we don't have all the error conditions handled properly.
One thing I'll suggest is not to run clients on the same nodes as
servers. With the same run, but on 14 clients against 14 servers
(incl 1 md server), no timeouts occur. If you insist on setting
things up like this, there are six values in fs.conf that you can
adjust to increase the timeouts:
ServerJobBMITimeoutSecs 30
ServerJobFlowTimeoutSecs 30
ClientJobBMITimeoutSecs 300
ClientJobFlowTimeoutSecs 300
ClientRetryLimit 5
ClientRetryDelayMilliSecs 2000
Play around with those numbers and you should probably get IOR to
run to completion. The down-side is that if a server dies, you
don't get an error message at the client for some possibly long
time.
-- Pete
_______________________________________________
Pvfs2-users mailing list
[email protected]
http://www.beowulf-underground.org/mailman/listinfo/pvfs2-users