Hi all
my orangefs 2.8.7 on Infiniband configuration file is listed below:
<Defaults>
UnexpectedRequests 50
EventLogging none
EnableTracing no
LogStamp datetime
BMIModules bmi_ib
FlowModules flowproto_multiqueue
PerfUpdateInterval 1000
ServerJobBMITimeoutSecs 30
ServerJobFlowTimeoutSecs 30
ClientJobBMITimeoutSecs 300
ClientJobFlowTimeoutSecs 300
ClientRetryLimit 5
ClientRetryDelayMilliSecs 2000
PrecreateBatchSize 0,32,512,32,32,32,0
PrecreateLowThreshold 0,16,256,16,16,16,0
DataStorageSpace /opt/orangefs/storage/data
MetadataStorageSpace /opt/orangefs/storage/meta
LogFile /opt/orangefs/log/server.log
</Defaults>
<Aliases>
Alias node1 ib://node1:3335
Alias node10 ib://node10:3335
Alias node2 ib://node2:3335
Alias node3 ib://node3:3335
Alias node4 ib://node4:3335
Alias node5 ib://node5:3335
Alias node6 ib://node6:3335
Alias node7 ib://node7:3335
Alias node8 ib://node8:3335
Alias node9 ib://node9:3335
</Aliases>
<Filesystem>
Name pvfs2-fs
ID 2093169860
RootHandle 1048576
FileStuffing yes
<MetaHandleRanges>
Range node1 3-461168601842738792
Range node10 461168601842738793-922337203685477582
Range node2 922337203685477583-1383505805528216372
Range node3 1383505805528216373-1844674407370955162
Range node4 1844674407370955163-2305843009213693952
Range node5 2305843009213693953-2767011611056432742
Range node6 2767011611056432743-3228180212899171532
Range node7 3228180212899171533-3689348814741910322
Range node8 3689348814741910323-4150517416584649112
Range node9 4150517416584649113-4611686018427387902
</MetaHandleRanges>
<DataHandleRanges>
Range node1 4611686018427387903-5072854620270126692
Range node10 5072854620270126693-5534023222112865482
Range node2 5534023222112865483-5995191823955604272
Range node3 5995191823955604273-6456360425798343062
Range node4 6456360425798343063-6917529027641081852
Range node5 6917529027641081853-7378697629483820642
Range node6 7378697629483820643-7839866231326559432
Range node7 7839866231326559433-8301034833169298222
Range node8 8301034833169298223-8762203435012037012
Range node9 8762203435012037013-9223372036854775802
</DataHandleRanges>
<StorageHints>
TroveSyncMeta yes
TroveSyncData no
TroveMethod alt-aio
</StorageHints>
</Filesystem>
and when I run IOR:
$ mpirun -machinefile /mnt/orangefs/mpd.hosts -np 20
/home/srcs/IOR/src/C/IOR -a MPIIO -N 20 -b 512m -d 10 -t 16m -o
/mnt/orangefs/file1 -g -w -r -s 1 -vv
I got these errors again and again:
[E 21:04:12.361706] job_time_mgr_expire: job time out: cancelling bmi
operation, job_id: 863.
[E 21:04:17.278587] job_time_mgr_expire: job time out: cancelling bmi
operation, job_id: 1085.
[E 21:04:26.697919] job_time_mgr_expire: job time out: cancelling bmi
operation, job_id: 1363.
[E 21:04:40.174997] Warning: encourage_recv_incoming: mop_id 2aaab003f0d0
in RTS_DONE message not found.
[E 21:14:40.564589] job_time_mgr_expire: job time out: cancelling flow
operation, job_id: 2058.
[E 21:14:40.564608] fp_multiqueue_cancel: flow proto cancel called on
0x195d6608
[E 21:14:40.564613] fp_multiqueue_cancel: I/O error occurred
[E 21:14:40.564618] handle_io_error: flow proto error cleanup started on
0x195d6608: Operation cancelled (possibly due to timeout)
[E 21:14:40.564665] handle_io_error: flow proto 0x195d6608 canceled 1
operations, will clean up.
[E 21:14:40.564871] bmi_to_mem_callback_fn: I/O error occurred
[E 21:14:40.564880] handle_io_error: flow proto 0x195d6608 error cleanup
finished: Operation cancelled (possibly due to timeout)
[E 21:14:40.564889] io_datafile_complete_operations: flow failed, retrying
from msgpair
so what's wrong with it?
I asked before but didn't solve it, please help me.
_______________________________________________
Pvfs2-users mailing list
[email protected]
http://www.beowulf-underground.org/mailman/listinfo/pvfs2-users