Thanks.
So do you have any idea on the problems listed before? Thanks a lot~


2013/10/24 Elaine Quarles <[email protected]>

> The RTS_DONE error messages indicate a problem that is specific to running
> on Infiniband. This will be fixed in the upcoming 2.8.8 release.
>
> Thanks,
> Elaine
>
>
> On Wed, Oct 23, 2013 at 2:05 PM, xihuang sun <[email protected]> wrote:
>
>> another error come out when I run:
>> mpiexec -machinefile mpd.hosts -np 10 /home/IOR/src/C/IOR -a MPIIO -N 10
>> -b 1g -d 5 -t 256k -o /mnt/orangefs/file1 -g -w -W -r -s 1 -vv
>>
>> I got
>> IOR-2.10.3: MPI Coordinated Test of Parallel I/O
>>
>> Run began: Thu Oct 24 04:06:47 2013
>> Command line used:  /home/IOR/src/C/IOR -a MPIIO -N 10 -b 1g -d 5 -t 256k
>> -o /mnt/orangefs/file1 -g -w -W -r -s 1 -vv
>> Machine: Linux node1 2.6.18-164.el5 #1 SMP Tue Aug 18 15:51:48 EDT 2009
>> x86_64
>> Using synchronized MPI timer
>> Start time skew across all tasks: 0.00 sec
>> Path: /mnt/orangefs
>> FS: 1.1 TiB   Used FS: 15.8%   Inodes: 8796093022208.0 Mi   Used Inodes:
>> 0.0%
>> Participating tasks: 10
>> task 0 on node1
>> task 1 on node2
>> task 2 on node3
>> task 3 on node4
>> task 4 on node5
>> task 5 on node6
>> task 6 on node7
>> task 7 on node8
>> task 8 on node9
>> task 9 on node10
>>
>> Summary:
>>         api                = MPIIO (version=2, subversion=2)
>>         test filename      = /mnt/orangefs/file1
>>         access             = single-shared-file, independent
>>         pattern            = segmented (1 segment)
>>         ordering in a file = sequential offsets
>>         ordering inter file= no tasks offsets
>>         clients            = 10 (1 per node)
>>         repetitions        = 1
>>         xfersize           = 262144 bytes
>>         blocksize          = 1 GiB
>>         aggregate filesize = 10 GiB
>>
>> Using Time Stamp 1382558807 (0x52682c57) for Data Signature
>> delaying 5 seconds . . .
>> Commencing write performance test.
>> Thu Oct 24 04:06:52 2013
>>
>> \^[[Aaccess    bw(MiB/s)  block(KiB) xfer(KiB)  open(s)    wr/rd(s)
>> close(s) total(s)  iter
>> ------    ---------  ---------- ---------  --------   --------   --------
>>  --------   ----
>> write     28.83      1048576    256.00     0.170848   355.01     0.001721
>>   355.18     0    XXCEL
>> Verifying contents of the file(s) just written.
>> Thu Oct 24 04:12:47 2013
>>
>> [E 04:12:49.364807] Warning: encourage_recv_incoming: mop_id 629fe60 in
>> RTS_DONE message not found.
>> [E 04:12:50.350602] Warning: encourage_recv_incoming: mop_id 2aaaae91aaf0
>> in RTS_DONE message not found.
>> [E 04:12:50.613899] Warning: encourage_recv_incoming: mop_id 2aaaac009a30
>> in RTS_DONE message not found.
>> [E 04:12:51.175232] Warning: encourage_recv_incoming: mop_id 6f07940 in
>> RTS_DONE message not found.
>>
>> I think maybe I am on the wrong way.
>>
>> thanks for help~
>>
>>
>>
>> 2013/10/24 xihuang sun <[email protected]>
>>
>>> Hi,
>>> I'm using orangefs2.8.7 over InfiniBand and testing with IOR. some
>>> errors are below.
>>> now here is the command:
>>> mpiexec -machinefile mpd.hosts -np 4 /home/IOR/src/C/IOR -a MPIIO -N 4
>>> -b 1g -d 5 -t 2m -o /mnt/orangefs/file1 -c -g -w -W -r -s 1 -vv
>>>
>>> it SOMETIMES get the error below:
>>>
>>> [E 02:52:21.394237] fp_multiqueue_cancel: flow proto cancel called on
>>> 0x12bff388
>>> [E 02:52:21.394326] fp_multiqueue_cancel: I/O error occurred
>>> [E 02:52:21.394333] handle_io_error: flow proto error cleanup started on
>>> 0x12bff388: Operation cancelled (possibly due to timeout)
>>> [E 02:52:21.394341] handle_io_error: flow proto 0x12bff388 canceled 1
>>> operations, will clean up.
>>> [E 02:52:21.394349] mem_to_bmi_callback_fn: I/O error occurred
>>> [E 02:52:21.394353] handle_io_error: flow proto 0x12bff388 error cleanup
>>> finished: Operation cancelled (possibly due to timeout)
>>> [E 02:52:21.394358] io_datafile_complete_operations: flow failed,
>>> retrying from msgpair
>>>
>>> and SOMETIMES the data can't be all written into the /mnt/orangefs(like
>>> 4GB case in the example command  can only write 3.2GB into the
>>> /mnt/orangefs and the program get stuck in,with no error message shown).
>>>
>>> I don't know whether these two situations have some connections. Or my
>>> command is wrong?
>>>
>>> I tried to solve it with pvfs2-set-sync to set the -D and -M to 1 and
>>> the write operations get down to a very low rate
>>>
>>> Max Write: 17.61 MiB/sec (18.47 MB/sec)
>>> Max Read:  3403.92 MiB/sec (3569.27 MB/sec)
>>>
>>> so my question is what's wrong with it?
>>>
>>
>>
>> _______________________________________________
>> Pvfs2-users mailing list
>> [email protected]
>> http://www.beowulf-underground.org/mailman/listinfo/pvfs2-users
>>
>>
>
_______________________________________________
Pvfs2-users mailing list
[email protected]
http://www.beowulf-underground.org/mailman/listinfo/pvfs2-users

Reply via email to