Sam and I looked into a problem we found with the noncontig-
test that I'm using as one of my benchmarks in my suite.
Test setup:
pvfs2-fs: MX on 4 data servers, 5th server is the client. (CVS
Head)
If I run the test using MX, it will fail, but with TCP, the
test completes, we had originally thought that this was a
problem in the pint-request code (as the log will indicate) but
I'm wondering now why it would fail using a different
transport.. To clear up the obvious problems, I've run other
benchmarks using the same setup, before and after this error
shows up and those all run to completion just fine on both mx
and tcp.
Any ideas where to start with this?
thanks,
Kyle
__Output__
TCP:
[EMAIL PROTECTED]:~/framework/noncontig-test/noncontig$ mpirun -np
1 ./noncontig -fname pvfs2://tmp/pvfs2/blah -fsize 1 -timing
========= Parameter space dump =========
filename: pvfs2://tmp/pvfs2/blah ionodes
file size (MB): 1 buffer size 0
vector length: 10 element count: 1 vector count: 0
striping factor: 0 striping size: -1 collective buffer size: 0
loops: 1 displacement 0
========= Dump done =========
#* no verification possible!
# testing noncontiguous in memory, noncontiguous in file using
independent I/O
# vector count = 26214 - access count = 26214
write bandwidth (min/max/acc [MB/s]) : 0.331 / 0.331 / 0.331
read bandwidth (min/max/acc [MB/s]) : 0.370 / 0.370 / 0.370
file size: 1024kB size per process: 1023kB
# testing noncontiguous in memory, contiguous in file using
independent I/O
# vector count = 26214 - access count = 26214
write bandwidth (min/max/acc [MB/s]) : 0.692 / 0.692 / 0.692
read bandwidth (min/max/acc [MB/s]) : 0.766 / 0.766 / 0.766
file size: 1023kB size per process: 1023kB
# testing contiguous in memory, noncontiguous in file using
independent I/O
# vector count = 26214 - access count = 26214
write bandwidth (min/max/acc [MB/s]) : 0.348 / 0.348 / 0.348
read bandwidth (min/max/acc [MB/s]) : 0.392 / 0.392 / 0.392
file size: 1024kB size per process: 1023kB
MX:
[EMAIL PROTECTED]:~/framework/noncontig-test/noncontig$ `mpirun -np
1 ./noncontig -fname pvfs2://tmp/pvfs2/blah -fsize 1 &> mx_output`
========= Parameter space dump =========
filename: pvfs2://tmp/pvfs2/blah ionodes
file size (MB): 1 buffer size 0
vector length: 10 element count: 1 vector count: 0
striping factor: 0 striping size: -1 collective buffer size: 0
loops: 1 displacement 0
========= Dump done =========
#* no verification possible!
# testing noncontiguous in memory, noncontiguous in file using
independent I/O
# vector count = 26214 - access count = 26214
[E 13:39:06.029976] src/io/description/pint-request.c line 95:
PINT_process_requ
est: no segments or bytes requested!
[E 13:39:06.030497] [bt] ./noncontig [0x4cd655]
[E 13:39:06.030555] [bt] ./noncontig [0x4b2e01]
[E 13:39:06.030608] [bt] ./noncontig [0x4ae8f1]
[E 13:39:06.030658] [bt] ./noncontig [0x507b62]
[E 13:39:06.030707] [bt] ./noncontig [0x5080dd]
[E 13:39:06.030756] [bt] ./noncontig [0x507e2f]
[E 13:39:06.030806] [bt] ./noncontig [0x4a5030]
[E 13:39:06.030854] [bt] ./noncontig [0x4ae202]
[E 13:39:06.030903] [bt] ./noncontig [0x4ae2d5]
[E 13:39:06.030952] [bt] ./noncontig [0x479ab0]
[E 13:39:06.031001] [bt] ./noncontig [0x41df43]
[E 13:39:06.031072] PVFS_isys_io call: Invalid argument
[0] Error -524286 in MPI_File_write
Undefined dynamic error code
[E 13:39:06.067249] Warning: non PVFS2 error code (22):
[E 13:39:06.067468] Send immediately failed: Invalid argument
[E 13:39:06.067525] Send error: cancelling recv.
[E 13:39:06.067599] Warning: non PVFS2 error code (22):
[E 13:39:06.067651] msgpair failed, will retry: Invalid argument
[E 13:39:06.067706] *** msgpairarray_completion_fn: msgpair to
server mx://bb15:
0:3 failed: Invalid argument
[E 13:39:06.067755] *** Non-BMI failure.
[E 13:39:06.074742] Warning: non PVFS2 error code (22):
[E 13:39:06.074795] Send immediately failed: Invalid argument
[E 13:39:06.074843] Send error: cancelling recv.
[E 13:39:06.074900] Warning: non PVFS2 error code (22):
[E 13:39:06.074948] msgpair failed, will retry: Invalid argument
[E 13:39:06.074998] *** msgpairarray_completion_fn: msgpair to
server mx://bb15:
0:3 failed: Invalid argument
[E 13:39:06.075046] *** Non-BMI failure.
[E 13:39:06.075396] Warning: non PVFS2 error code (22):
[E 13:39:06.075447] Send immediately failed: Invalid argument
[E 13:39:06.075493] Send error: cancelling recv.
[E 13:39:06.075551] Warning: non PVFS2 error code (22):
[E 13:39:06.075599] msgpair failed, will retry: Invalid argument
[E 13:39:06.075649] *** msgpairarray_completion_fn: msgpair to
server mx://bb15: