Hi all
my orangefs 2.8.7 on Infiniband configuration file is listed below:
<Defaults>
        UnexpectedRequests 50
        EventLogging none
        EnableTracing no
        LogStamp datetime
        BMIModules bmi_ib
        FlowModules flowproto_multiqueue
        PerfUpdateInterval 1000
        ServerJobBMITimeoutSecs 30
        ServerJobFlowTimeoutSecs 30
        ClientJobBMITimeoutSecs 300
        ClientJobFlowTimeoutSecs 300
        ClientRetryLimit 5
        ClientRetryDelayMilliSecs 2000
        PrecreateBatchSize 0,32,512,32,32,32,0
        PrecreateLowThreshold 0,16,256,16,16,16,0

        DataStorageSpace /opt/orangefs/storage/data
        MetadataStorageSpace /opt/orangefs/storage/meta

        LogFile /opt/orangefs/log/server.log
</Defaults>

<Aliases>
        Alias node1 ib://node1:3335
        Alias node10 ib://node10:3335
        Alias node2 ib://node2:3335
        Alias node3 ib://node3:3335
        Alias node4 ib://node4:3335
        Alias node5 ib://node5:3335
        Alias node6 ib://node6:3335
        Alias node7 ib://node7:3335
        Alias node8 ib://node8:3335
        Alias node9 ib://node9:3335
</Aliases>

<Filesystem>
        Name pvfs2-fs
        ID 2093169860
        RootHandle 1048576
        FileStuffing yes
        <MetaHandleRanges>
                Range node1 3-461168601842738792
                Range node10 461168601842738793-922337203685477582
                Range node2 922337203685477583-1383505805528216372
                Range node3 1383505805528216373-1844674407370955162
                Range node4 1844674407370955163-2305843009213693952
                Range node5 2305843009213693953-2767011611056432742
                Range node6 2767011611056432743-3228180212899171532
                Range node7 3228180212899171533-3689348814741910322
                Range node8 3689348814741910323-4150517416584649112
                Range node9 4150517416584649113-4611686018427387902
        </MetaHandleRanges>
        <DataHandleRanges>
                Range node1 4611686018427387903-5072854620270126692
                Range node10 5072854620270126693-5534023222112865482
                Range node2 5534023222112865483-5995191823955604272
                Range node3 5995191823955604273-6456360425798343062
                Range node4 6456360425798343063-6917529027641081852
                Range node5 6917529027641081853-7378697629483820642
                Range node6 7378697629483820643-7839866231326559432
                Range node7 7839866231326559433-8301034833169298222
                Range node8 8301034833169298223-8762203435012037012
                Range node9 8762203435012037013-9223372036854775802
        </DataHandleRanges>
        <StorageHints>
                TroveSyncMeta yes
                TroveSyncData no
                TroveMethod alt-aio
        </StorageHints>
</Filesystem>
and when I run IOR:
$ mpirun -machinefile /mnt/orangefs/mpd.hosts -np 20
/home/srcs/IOR/src/C/IOR -a MPIIO -N 20 -b 512m -d 10 -t 16m -o
/mnt/orangefs/file1 -g -w -r -s 1 -vv
I got these errors again and again:
[E 21:04:12.361706] job_time_mgr_expire: job time out: cancelling bmi
operation, job_id: 863.
[E 21:04:17.278587] job_time_mgr_expire: job time out: cancelling bmi
operation, job_id: 1085.
[E 21:04:26.697919] job_time_mgr_expire: job time out: cancelling bmi
operation, job_id: 1363.
[E 21:04:40.174997] Warning: encourage_recv_incoming: mop_id 2aaab003f0d0
in RTS_DONE message not found.
[E 21:14:40.564589] job_time_mgr_expire: job time out: cancelling flow
operation, job_id: 2058.
[E 21:14:40.564608] fp_multiqueue_cancel: flow proto cancel called on
0x195d6608
[E 21:14:40.564613] fp_multiqueue_cancel: I/O error occurred
[E 21:14:40.564618] handle_io_error: flow proto error cleanup started on
0x195d6608: Operation cancelled (possibly due to timeout)
[E 21:14:40.564665] handle_io_error: flow proto 0x195d6608 canceled 1
operations, will clean up.
[E 21:14:40.564871] bmi_to_mem_callback_fn: I/O error occurred
[E 21:14:40.564880] handle_io_error: flow proto 0x195d6608 error cleanup
finished: Operation cancelled (possibly due to timeout)
[E 21:14:40.564889] io_datafile_complete_operations: flow failed, retrying
from msgpair

so what's wrong with it?
I asked before but didn't solve it, please help me.
_______________________________________________
Pvfs2-users mailing list
[email protected]
http://www.beowulf-underground.org/mailman/listinfo/pvfs2-users

Reply via email to