The RTS_DONE error messages indicate a problem that is specific to running
on Infiniband. This will be fixed in the upcoming 2.8.8 release.

Thanks,
Elaine


On Wed, Oct 23, 2013 at 2:05 PM, xihuang sun <[email protected]> wrote:

> another error come out when I run:
> mpiexec -machinefile mpd.hosts -np 10 /home/IOR/src/C/IOR -a MPIIO -N 10
> -b 1g -d 5 -t 256k -o /mnt/orangefs/file1 -g -w -W -r -s 1 -vv
>
> I got
> IOR-2.10.3: MPI Coordinated Test of Parallel I/O
>
> Run began: Thu Oct 24 04:06:47 2013
> Command line used:  /home/IOR/src/C/IOR -a MPIIO -N 10 -b 1g -d 5 -t 256k
> -o /mnt/orangefs/file1 -g -w -W -r -s 1 -vv
> Machine: Linux node1 2.6.18-164.el5 #1 SMP Tue Aug 18 15:51:48 EDT 2009
> x86_64
> Using synchronized MPI timer
> Start time skew across all tasks: 0.00 sec
> Path: /mnt/orangefs
> FS: 1.1 TiB   Used FS: 15.8%   Inodes: 8796093022208.0 Mi   Used Inodes:
> 0.0%
> Participating tasks: 10
> task 0 on node1
> task 1 on node2
> task 2 on node3
> task 3 on node4
> task 4 on node5
> task 5 on node6
> task 6 on node7
> task 7 on node8
> task 8 on node9
> task 9 on node10
>
> Summary:
>         api                = MPIIO (version=2, subversion=2)
>         test filename      = /mnt/orangefs/file1
>         access             = single-shared-file, independent
>         pattern            = segmented (1 segment)
>         ordering in a file = sequential offsets
>         ordering inter file= no tasks offsets
>         clients            = 10 (1 per node)
>         repetitions        = 1
>         xfersize           = 262144 bytes
>         blocksize          = 1 GiB
>         aggregate filesize = 10 GiB
>
> Using Time Stamp 1382558807 (0x52682c57) for Data Signature
> delaying 5 seconds . . .
> Commencing write performance test.
> Thu Oct 24 04:06:52 2013
>
> \^[[Aaccess    bw(MiB/s)  block(KiB) xfer(KiB)  open(s)    wr/rd(s)
> close(s) total(s)  iter
> ------    ---------  ---------- ---------  --------   --------   --------
>  --------   ----
> write     28.83      1048576    256.00     0.170848   355.01     0.001721
>   355.18     0    XXCEL
> Verifying contents of the file(s) just written.
> Thu Oct 24 04:12:47 2013
>
> [E 04:12:49.364807] Warning: encourage_recv_incoming: mop_id 629fe60 in
> RTS_DONE message not found.
> [E 04:12:50.350602] Warning: encourage_recv_incoming: mop_id 2aaaae91aaf0
> in RTS_DONE message not found.
> [E 04:12:50.613899] Warning: encourage_recv_incoming: mop_id 2aaaac009a30
> in RTS_DONE message not found.
> [E 04:12:51.175232] Warning: encourage_recv_incoming: mop_id 6f07940 in
> RTS_DONE message not found.
>
> I think maybe I am on the wrong way.
>
> thanks for help~
>
>
>
> 2013/10/24 xihuang sun <[email protected]>
>
>> Hi,
>> I'm using orangefs2.8.7 over InfiniBand and testing with IOR. some errors
>> are below.
>> now here is the command:
>> mpiexec -machinefile mpd.hosts -np 4 /home/IOR/src/C/IOR -a MPIIO -N 4 -b
>> 1g -d 5 -t 2m -o /mnt/orangefs/file1 -c -g -w -W -r -s 1 -vv
>>
>> it SOMETIMES get the error below:
>>
>> [E 02:52:21.394237] fp_multiqueue_cancel: flow proto cancel called on
>> 0x12bff388
>> [E 02:52:21.394326] fp_multiqueue_cancel: I/O error occurred
>> [E 02:52:21.394333] handle_io_error: flow proto error cleanup started on
>> 0x12bff388: Operation cancelled (possibly due to timeout)
>> [E 02:52:21.394341] handle_io_error: flow proto 0x12bff388 canceled 1
>> operations, will clean up.
>> [E 02:52:21.394349] mem_to_bmi_callback_fn: I/O error occurred
>> [E 02:52:21.394353] handle_io_error: flow proto 0x12bff388 error cleanup
>> finished: Operation cancelled (possibly due to timeout)
>> [E 02:52:21.394358] io_datafile_complete_operations: flow failed,
>> retrying from msgpair
>>
>> and SOMETIMES the data can't be all written into the /mnt/orangefs(like
>> 4GB case in the example command  can only write 3.2GB into the
>> /mnt/orangefs and the program get stuck in,with no error message shown).
>>
>> I don't know whether these two situations have some connections. Or my
>> command is wrong?
>>
>> I tried to solve it with pvfs2-set-sync to set the -D and -M to 1 and the
>> write operations get down to a very low rate
>>
>> Max Write: 17.61 MiB/sec (18.47 MB/sec)
>> Max Read:  3403.92 MiB/sec (3569.27 MB/sec)
>>
>> so my question is what's wrong with it?
>>
>
>
> _______________________________________________
> Pvfs2-users mailing list
> [email protected]
> http://www.beowulf-underground.org/mailman/listinfo/pvfs2-users
>
>
_______________________________________________
Pvfs2-users mailing list
[email protected]
http://www.beowulf-underground.org/mailman/listinfo/pvfs2-users

Reply via email to