Sam and I looked into a problem we found with the noncontig-test
that I'm using as one of my benchmarks in my suite.
Test setup:
pvfs2-fs: MX on 4 data servers, 5th server is the client. (CVS Head)
If I run the test using MX, it will fail, but with TCP, the test
completes, we had originally thought that this was a problem in
the pint-request code (as the log will indicate) but I'm
wondering now why it would fail using a different transport.. To
clear up the obvious problems, I've run other benchmarks using
the same setup, before and after this error shows up and those
all run to completion just fine on both mx and tcp.
Any ideas where to start with this?
thanks,
Kyle
__Output__
TCP:
[EMAIL PROTECTED]:~/framework/noncontig-test/noncontig$ mpirun -np
1 ./noncontig -fname pvfs2://tmp/pvfs2/blah -fsize 1 -timing
========= Parameter space dump =========
filename: pvfs2://tmp/pvfs2/blah ionodes
file size (MB): 1 buffer size 0
vector length: 10 element count: 1 vector count: 0
striping factor: 0 striping size: -1 collective buffer size: 0
loops: 1 displacement 0
========= Dump done =========
#* no verification possible!
# testing noncontiguous in memory, noncontiguous in file using
independent I/O
# vector count = 26214 - access count = 26214
write bandwidth (min/max/acc [MB/s]) : 0.331 / 0.331 / 0.331
read bandwidth (min/max/acc [MB/s]) : 0.370 / 0.370 / 0.370
file size: 1024kB size per process: 1023kB
# testing noncontiguous in memory, contiguous in file using
independent I/O
# vector count = 26214 - access count = 26214
write bandwidth (min/max/acc [MB/s]) : 0.692 / 0.692 / 0.692
read bandwidth (min/max/acc [MB/s]) : 0.766 / 0.766 / 0.766
file size: 1023kB size per process: 1023kB
# testing contiguous in memory, noncontiguous in file using
independent I/O
# vector count = 26214 - access count = 26214
write bandwidth (min/max/acc [MB/s]) : 0.348 / 0.348 / 0.348
read bandwidth (min/max/acc [MB/s]) : 0.392 / 0.392 / 0.392
file size: 1024kB size per process: 1023kB
MX:
[EMAIL PROTECTED]:~/framework/noncontig-test/noncontig$ `mpirun -np
1 ./noncontig -fname pvfs2://tmp/pvfs2/blah -fsize 1 &> mx_output`
========= Parameter space dump =========
filename: pvfs2://tmp/pvfs2/blah ionodes
file size (MB): 1 buffer size 0
vector length: 10 element count: 1 vector count: 0
striping factor: 0 striping size: -1 collective buffer size: 0
loops: 1 displacement 0
========= Dump done =========
#* no verification possible!
# testing noncontiguous in memory, noncontiguous in file using
independent I/O
# vector count = 26214 - access count = 26214
[E 13:39:06.029976] src/io/description/pint-request.c line 95:
PINT_process_requ
est: no segments or bytes requested!
[E 13:39:06.030497] [bt] ./noncontig [0x4cd655]
[E 13:39:06.030555] [bt] ./noncontig [0x4b2e01]
[E 13:39:06.030608] [bt] ./noncontig [0x4ae8f1]
[E 13:39:06.030658] [bt] ./noncontig [0x507b62]
[E 13:39:06.030707] [bt] ./noncontig [0x5080dd]
[E 13:39:06.030756] [bt] ./noncontig [0x507e2f]
[E 13:39:06.030806] [bt] ./noncontig [0x4a5030]
[E 13:39:06.030854] [bt] ./noncontig [0x4ae202]
[E 13:39:06.030903] [bt] ./noncontig [0x4ae2d5]
[E 13:39:06.030952] [bt] ./noncontig [0x479ab0]
[E 13:39:06.031001] [bt] ./noncontig [0x41df43]
[E 13:39:06.031072] PVFS_isys_io call: Invalid argument
[0] Error -524286 in MPI_File_write
Undefined dynamic error code
[E 13:39:06.067249] Warning: non PVFS2 error code (22):
[E 13:39:06.067468] Send immediately failed: Invalid argument
[E 13:39:06.067525] Send error: cancelling recv.
[E 13:39:06.067599] Warning: non PVFS2 error code (22):
[E 13:39:06.067651] msgpair failed, will retry: Invalid argument
[E 13:39:06.067706] *** msgpairarray_completion_fn: msgpair to
server mx://bb15:
0:3 failed: Invalid argument
[E 13:39:06.067755] *** Non-BMI failure.
[E 13:39:06.074742] Warning: non PVFS2 error code (22):
[E 13:39:06.074795] Send immediately failed: Invalid argument
[E 13:39:06.074843] Send error: cancelling recv.
[E 13:39:06.074900] Warning: non PVFS2 error code (22):
[E 13:39:06.074948] msgpair failed, will retry: Invalid argument
[E 13:39:06.074998] *** msgpairarray_completion_fn: msgpair to
server mx://bb15:
0:3 failed: Invalid argument
[E 13:39:06.075046] *** Non-BMI failure.
[E 13:39:06.075396] Warning: non PVFS2 error code (22):
[E 13:39:06.075447] Send immediately failed: Invalid argument
[E 13:39:06.075493] Send error: cancelling recv.
[E 13:39:06.075551] Warning: non PVFS2 error code (22):
[E 13:39:06.075599] msgpair failed, will retry: Invalid argument
[E 13:39:06.075649] *** msgpairarray_completion_fn: msgpair to
server mx://bb15: