Scott Atchley wrote:
Kyle,Are you using mpich-mx or mpich or mpich2? Are you using the bmi_mx code in PVFS cvs? I am not sure if mpich-mx supports non-contiguous data.
I'm using mpich2. mpich2-1.0.5p4, and CVS head.
I just built with your changes and the changes that follow, and still have the error. I'll attach the logfile here, I'm not sure if it makes any more sense now then it did before :-/.If you are using bmi_mx that is in your cvs, please try using the files I sent today (I have not had a chance to update my PVFS2 cvs and create a patch). Error 22 is EINVAL in Linux and I actually used that in some of my older code.
thanks, Kyle
Also, can you run with PVFS2_DEBUGMASK=all? Can you edit $PVFS2/src/io/bmi/bmi_mx/mx.h so that BMX_DEBUG is 1 and change:#define BMX_DB_MASK (BMX_DB_ERR|BMX_DB_WARN) to #define BMX_DB_MASK (BMX_DB_ERR|BMX_DB_WARN|BMX_DB_ALL) There will be a lot of output but it may point out the issue. Scott On Aug 2, 2007, at 2:56 PM, Kyle Schochenmaier wrote:Sam and I looked into a problem we found with the noncontig-test that I'm using as one of my benchmarks in my suite.Test setup: pvfs2-fs: MX on 4 data servers, 5th server is the client. (CVS Head)If I run the test using MX, it will fail, but with TCP, the test completes, we had originally thought that this was a problem in the pint-request code (as the log will indicate) but I'm wondering now why it would fail using a different transport.. To clear up the obvious problems, I've run other benchmarks using the same setup, before and after this error shows up and those all run to completion just fine on both mx and tcp.Any ideas where to start with this? thanks, Kyle __Output__ TCP:[EMAIL PROTECTED]:~/framework/noncontig-test/noncontig$ mpirun -np 1 ./noncontig -fname pvfs2://tmp/pvfs2/blah -fsize 1 -timing========= Parameter space dump ========= filename: pvfs2://tmp/pvfs2/blah ionodes file size (MB): 1 buffer size 0 vector length: 10 element count: 1 vector count: 0 striping factor: 0 striping size: -1 collective buffer size: 0 loops: 1 displacement 0 ========= Dump done ========= #* no verification possible!# testing noncontiguous in memory, noncontiguous in file using independent I/O# vector count = 26214 - access count = 26214 write bandwidth (min/max/acc [MB/s]) : 0.331 / 0.331 / 0.331 read bandwidth (min/max/acc [MB/s]) : 0.370 / 0.370 / 0.370 file size: 1024kB size per process: 1023kB# testing noncontiguous in memory, contiguous in file using independent I/O# vector count = 26214 - access count = 26214 write bandwidth (min/max/acc [MB/s]) : 0.692 / 0.692 / 0.692 read bandwidth (min/max/acc [MB/s]) : 0.766 / 0.766 / 0.766 file size: 1023kB size per process: 1023kB# testing contiguous in memory, noncontiguous in file using independent I/O# vector count = 26214 - access count = 26214 write bandwidth (min/max/acc [MB/s]) : 0.348 / 0.348 / 0.348 read bandwidth (min/max/acc [MB/s]) : 0.392 / 0.392 / 0.392 file size: 1024kB size per process: 1023kB MX:[EMAIL PROTECTED]:~/framework/noncontig-test/noncontig$ `mpirun -np 1 ./noncontig -fname pvfs2://tmp/pvfs2/blah -fsize 1 &> mx_output`========= Parameter space dump ========= filename: pvfs2://tmp/pvfs2/blah ionodes file size (MB): 1 buffer size 0 vector length: 10 element count: 1 vector count: 0 striping factor: 0 striping size: -1 collective buffer size: 0 loops: 1 displacement 0 ========= Dump done ========= #* no verification possible!# testing noncontiguous in memory, noncontiguous in file using independent I/O# vector count = 26214 - access count = 26214[E 13:39:06.029976] src/io/description/pint-request.c line 95: PINT_process_request: no segments or bytes requested! [E 13:39:06.030497] [bt] ./noncontig [0x4cd655] [E 13:39:06.030555] [bt] ./noncontig [0x4b2e01] [E 13:39:06.030608] [bt] ./noncontig [0x4ae8f1] [E 13:39:06.030658] [bt] ./noncontig [0x507b62] [E 13:39:06.030707] [bt] ./noncontig [0x5080dd] [E 13:39:06.030756] [bt] ./noncontig [0x507e2f] [E 13:39:06.030806] [bt] ./noncontig [0x4a5030] [E 13:39:06.030854] [bt] ./noncontig [0x4ae202] [E 13:39:06.030903] [bt] ./noncontig [0x4ae2d5] [E 13:39:06.030952] [bt] ./noncontig [0x479ab0] [E 13:39:06.031001] [bt] ./noncontig [0x41df43] [E 13:39:06.031072] PVFS_isys_io call: Invalid argument [0] Error -524286 in MPI_File_write Undefined dynamic error code [E 13:39:06.067249] Warning: non PVFS2 error code (22): [E 13:39:06.067468] Send immediately failed: Invalid argument [E 13:39:06.067525] Send error: cancelling recv. [E 13:39:06.067599] Warning: non PVFS2 error code (22): [E 13:39:06.067651] msgpair failed, will retry: Invalid argument[E 13:39:06.067706] *** msgpairarray_completion_fn: msgpair to server mx://bb15:0:3 failed: Invalid argument [E 13:39:06.067755] *** Non-BMI failure. [E 13:39:06.074742] Warning: non PVFS2 error code (22): [E 13:39:06.074795] Send immediately failed: Invalid argument [E 13:39:06.074843] Send error: cancelling recv. [E 13:39:06.074900] Warning: non PVFS2 error code (22): [E 13:39:06.074948] msgpair failed, will retry: Invalid argument[E 13:39:06.074998] *** msgpairarray_completion_fn: msgpair to server mx://bb15:0:3 failed: Invalid argument [E 13:39:06.075046] *** Non-BMI failure. [E 13:39:06.075396] Warning: non PVFS2 error code (22): [E 13:39:06.075447] Send immediately failed: Invalid argument [E 13:39:06.075493] Send error: cancelling recv. [E 13:39:06.075551] Warning: non PVFS2 error code (22): [E 13:39:06.075599] msgpair failed, will retry: Invalid argument[E 13:39:06.075649] *** msgpairarray_completion_fn: msgpair to server mx://bb15:
mx.output.bz2
Description: application/bzip
_______________________________________________ Pvfs2-developers mailing list [email protected] http://www.beowulf-underground.org/mailman/listinfo/pvfs2-developers
