Kyle,

Are you using mpich-mx or mpich or mpich2? Are you using the bmi_mx code in PVFS cvs? I am not sure if mpich-mx supports non-contiguous data.

If you are using bmi_mx that is in your cvs, please try using the files I sent today (I have not had a chance to update my PVFS2 cvs and create a patch). Error 22 is EINVAL in Linux and I actually used that in some of my older code.

Also, can you run with PVFS2_DEBUGMASK=all? Can you edit $PVFS2/src/ io/bmi/bmi_mx/mx.h so that BMX_DEBUG is 1 and change:

#define BMX_DB_MASK (BMX_DB_ERR|BMX_DB_WARN)

to

#define BMX_DB_MASK (BMX_DB_ERR|BMX_DB_WARN|BMX_DB_ALL)

There will be a lot of output but it may point out the issue.

Scott

On Aug 2, 2007, at 2:56 PM, Kyle Schochenmaier wrote:

Sam and I looked into a problem we found with the noncontig-test that I'm using as one of my benchmarks in my suite.

Test setup:
pvfs2-fs: MX on 4 data servers, 5th server is the client. (CVS Head)

If I run the test using MX, it will fail, but with TCP, the test completes, we had originally thought that this was a problem in the pint-request code (as the log will indicate) but I'm wondering now why it would fail using a different transport.. To clear up the obvious problems, I've run other benchmarks using the same setup, before and after this error shows up and those all run to completion just fine on both mx and tcp.

Any ideas where to start with this?

thanks,
Kyle

__Output__

TCP:

[EMAIL PROTECTED]:~/framework/noncontig-test/noncontig$ mpirun -np 1 ./ noncontig -fname pvfs2://tmp/pvfs2/blah -fsize 1 -timing
========= Parameter space dump =========
filename: pvfs2://tmp/pvfs2/blah  ionodes
file size (MB): 1 buffer size 0
vector length: 10 element count: 1 vector count: 0
striping factor: 0 striping size: -1 collective buffer size: 0
loops: 1 displacement 0
========= Dump done            =========
#* no verification possible!

# testing noncontiguous in memory, noncontiguous in file using independent I/O
# vector count = 26214 - access count = 26214
write bandwidth (min/max/acc [MB/s]) :  0.331 /  0.331 /  0.331
read  bandwidth (min/max/acc [MB/s]) :  0.370 /  0.370 /  0.370
file size: 1024kB  size per process: 1023kB

# testing noncontiguous in memory, contiguous in file using independent I/O
# vector count = 26214 - access count = 26214
write bandwidth (min/max/acc [MB/s]) :  0.692 /  0.692 /  0.692
read  bandwidth (min/max/acc [MB/s]) :  0.766 /  0.766 /  0.766
file size: 1023kB  size per process: 1023kB

# testing contiguous in memory, noncontiguous in file using independent I/O
# vector count = 26214 - access count = 26214
write bandwidth (min/max/acc [MB/s]) :  0.348 /  0.348 /  0.348
read  bandwidth (min/max/acc [MB/s]) :  0.392 /  0.392 /  0.392
file size: 1024kB  size per process: 1023kB


MX:
[EMAIL PROTECTED]:~/framework/noncontig-test/noncontig$ `mpirun -np 1 ./ noncontig -fname pvfs2://tmp/pvfs2/blah -fsize 1 &> mx_output`


========= Parameter space dump =========
filename: pvfs2://tmp/pvfs2/blah  ionodes
file size (MB): 1 buffer size 0
vector length: 10 element count: 1 vector count: 0
striping factor: 0 striping size: -1 collective buffer size: 0
loops: 1 displacement 0
========= Dump done            =========
#* no verification possible!

# testing noncontiguous in memory, noncontiguous in file using independent I/O
# vector count = 26214 - access count = 26214
[E 13:39:06.029976] src/io/description/pint-request.c line 95: PINT_process_requ
est: no segments or bytes requested!
[E 13:39:06.030497]     [bt] ./noncontig [0x4cd655]
[E 13:39:06.030555]     [bt] ./noncontig [0x4b2e01]
[E 13:39:06.030608]     [bt] ./noncontig [0x4ae8f1]
[E 13:39:06.030658]     [bt] ./noncontig [0x507b62]
[E 13:39:06.030707]     [bt] ./noncontig [0x5080dd]
[E 13:39:06.030756]     [bt] ./noncontig [0x507e2f]
[E 13:39:06.030806]     [bt] ./noncontig [0x4a5030]
[E 13:39:06.030854]     [bt] ./noncontig [0x4ae202]
[E 13:39:06.030903]     [bt] ./noncontig [0x4ae2d5]
[E 13:39:06.030952]     [bt] ./noncontig [0x479ab0]
[E 13:39:06.031001]     [bt] ./noncontig [0x41df43]
[E 13:39:06.031072] PVFS_isys_io call: Invalid argument
[0] Error -524286 in MPI_File_write
Undefined dynamic error code
[E 13:39:06.067249] Warning: non PVFS2 error code (22):
[E 13:39:06.067468] Send immediately failed: Invalid argument
[E 13:39:06.067525] Send error: cancelling recv.
[E 13:39:06.067599] Warning: non PVFS2 error code (22):
[E 13:39:06.067651] msgpair failed, will retry: Invalid argument
[E 13:39:06.067706] *** msgpairarray_completion_fn: msgpair to server mx://bb15:
0:3 failed: Invalid argument
[E 13:39:06.067755] *** Non-BMI failure.
[E 13:39:06.074742] Warning: non PVFS2 error code (22):
[E 13:39:06.074795] Send immediately failed: Invalid argument
[E 13:39:06.074843] Send error: cancelling recv.
[E 13:39:06.074900] Warning: non PVFS2 error code (22):
[E 13:39:06.074948] msgpair failed, will retry: Invalid argument
[E 13:39:06.074998] *** msgpairarray_completion_fn: msgpair to server mx://bb15:
0:3 failed: Invalid argument
[E 13:39:06.075046] *** Non-BMI failure.
[E 13:39:06.075396] Warning: non PVFS2 error code (22):
[E 13:39:06.075447] Send immediately failed: Invalid argument
[E 13:39:06.075493] Send error: cancelling recv.
[E 13:39:06.075551] Warning: non PVFS2 error code (22):
[E 13:39:06.075599] msgpair failed, will retry: Invalid argument
[E 13:39:06.075649] *** msgpairarray_completion_fn: msgpair to server mx://bb15:



_______________________________________________
Pvfs2-developers mailing list
[email protected]
http://www.beowulf-underground.org/mailman/listinfo/pvfs2-developers

Reply via email to