Thanks. So do you have any idea on the problems listed before? Thanks a lot~
2013/10/24 Elaine Quarles <[email protected]> > The RTS_DONE error messages indicate a problem that is specific to running > on Infiniband. This will be fixed in the upcoming 2.8.8 release. > > Thanks, > Elaine > > > On Wed, Oct 23, 2013 at 2:05 PM, xihuang sun <[email protected]> wrote: > >> another error come out when I run: >> mpiexec -machinefile mpd.hosts -np 10 /home/IOR/src/C/IOR -a MPIIO -N 10 >> -b 1g -d 5 -t 256k -o /mnt/orangefs/file1 -g -w -W -r -s 1 -vv >> >> I got >> IOR-2.10.3: MPI Coordinated Test of Parallel I/O >> >> Run began: Thu Oct 24 04:06:47 2013 >> Command line used: /home/IOR/src/C/IOR -a MPIIO -N 10 -b 1g -d 5 -t 256k >> -o /mnt/orangefs/file1 -g -w -W -r -s 1 -vv >> Machine: Linux node1 2.6.18-164.el5 #1 SMP Tue Aug 18 15:51:48 EDT 2009 >> x86_64 >> Using synchronized MPI timer >> Start time skew across all tasks: 0.00 sec >> Path: /mnt/orangefs >> FS: 1.1 TiB Used FS: 15.8% Inodes: 8796093022208.0 Mi Used Inodes: >> 0.0% >> Participating tasks: 10 >> task 0 on node1 >> task 1 on node2 >> task 2 on node3 >> task 3 on node4 >> task 4 on node5 >> task 5 on node6 >> task 6 on node7 >> task 7 on node8 >> task 8 on node9 >> task 9 on node10 >> >> Summary: >> api = MPIIO (version=2, subversion=2) >> test filename = /mnt/orangefs/file1 >> access = single-shared-file, independent >> pattern = segmented (1 segment) >> ordering in a file = sequential offsets >> ordering inter file= no tasks offsets >> clients = 10 (1 per node) >> repetitions = 1 >> xfersize = 262144 bytes >> blocksize = 1 GiB >> aggregate filesize = 10 GiB >> >> Using Time Stamp 1382558807 (0x52682c57) for Data Signature >> delaying 5 seconds . . . >> Commencing write performance test. >> Thu Oct 24 04:06:52 2013 >> >> \^[[Aaccess bw(MiB/s) block(KiB) xfer(KiB) open(s) wr/rd(s) >> close(s) total(s) iter >> ------ --------- ---------- --------- -------- -------- -------- >> -------- ---- >> write 28.83 1048576 256.00 0.170848 355.01 0.001721 >> 355.18 0 XXCEL >> Verifying contents of the file(s) just written. >> Thu Oct 24 04:12:47 2013 >> >> [E 04:12:49.364807] Warning: encourage_recv_incoming: mop_id 629fe60 in >> RTS_DONE message not found. >> [E 04:12:50.350602] Warning: encourage_recv_incoming: mop_id 2aaaae91aaf0 >> in RTS_DONE message not found. >> [E 04:12:50.613899] Warning: encourage_recv_incoming: mop_id 2aaaac009a30 >> in RTS_DONE message not found. >> [E 04:12:51.175232] Warning: encourage_recv_incoming: mop_id 6f07940 in >> RTS_DONE message not found. >> >> I think maybe I am on the wrong way. >> >> thanks for help~ >> >> >> >> 2013/10/24 xihuang sun <[email protected]> >> >>> Hi, >>> I'm using orangefs2.8.7 over InfiniBand and testing with IOR. some >>> errors are below. >>> now here is the command: >>> mpiexec -machinefile mpd.hosts -np 4 /home/IOR/src/C/IOR -a MPIIO -N 4 >>> -b 1g -d 5 -t 2m -o /mnt/orangefs/file1 -c -g -w -W -r -s 1 -vv >>> >>> it SOMETIMES get the error below: >>> >>> [E 02:52:21.394237] fp_multiqueue_cancel: flow proto cancel called on >>> 0x12bff388 >>> [E 02:52:21.394326] fp_multiqueue_cancel: I/O error occurred >>> [E 02:52:21.394333] handle_io_error: flow proto error cleanup started on >>> 0x12bff388: Operation cancelled (possibly due to timeout) >>> [E 02:52:21.394341] handle_io_error: flow proto 0x12bff388 canceled 1 >>> operations, will clean up. >>> [E 02:52:21.394349] mem_to_bmi_callback_fn: I/O error occurred >>> [E 02:52:21.394353] handle_io_error: flow proto 0x12bff388 error cleanup >>> finished: Operation cancelled (possibly due to timeout) >>> [E 02:52:21.394358] io_datafile_complete_operations: flow failed, >>> retrying from msgpair >>> >>> and SOMETIMES the data can't be all written into the /mnt/orangefs(like >>> 4GB case in the example command can only write 3.2GB into the >>> /mnt/orangefs and the program get stuck in,with no error message shown). >>> >>> I don't know whether these two situations have some connections. Or my >>> command is wrong? >>> >>> I tried to solve it with pvfs2-set-sync to set the -D and -M to 1 and >>> the write operations get down to a very low rate >>> >>> Max Write: 17.61 MiB/sec (18.47 MB/sec) >>> Max Read: 3403.92 MiB/sec (3569.27 MB/sec) >>> >>> so my question is what's wrong with it? >>> >> >> >> _______________________________________________ >> Pvfs2-users mailing list >> [email protected] >> http://www.beowulf-underground.org/mailman/listinfo/pvfs2-users >> >> >
_______________________________________________ Pvfs2-users mailing list [email protected] http://www.beowulf-underground.org/mailman/listinfo/pvfs2-users
