What do you mean when you say "fails?" What you have shown here SHOULD produce an error - it should not crash. The bytemax should not be less than bytes, and in any case should not be negative. It seems that the caller has for some reason passed an inproperly set up result structure.

I haven't check the bmi code, but this appears to be a module that is trying to decide which servers have part of the data for this request. For this we usually set the bytemax to 1 (which says if there is at least one byte on this server, stop and let us know). Maybe we should add an error check for a negative bytemax, but at least in this case it should have called gossip_error.

Walt

Scott Atchley wrote:
Hi Sam,

Kyle sent me the code and I compiled it this morning.

First, I was using mpich2-mx compiled with PVFS2 support. It failed with the error that MX was already initialized. Both mpich2-mx and bmi_mx are calling mx_init(). I changed bmi_mx to ignore MX_ALREADY_INITIALIZED.

Second, I do not see any errors returned in bmi_mx. It fails in PINT_process_request (see call trace below). The request has segs = 0, bytemax = -1291, and bytes = 0.

It could well be that these values are incorrect due to a bug in bmi_mx that is not flagging an error, but I have no idea.

Can you take a look at this?

Thanks,

Scott


0:  (gdb) b PINT_process_request
0: Breakpoint 2 at 0x4701c8: file src/io/description/pint-request.c, line 72.
0:  (gdb) run -fname pvfs2://mnt/pvfs2/atchley/blah -fsize 1 -timing
0:  Continuing.
0:  ========= Parameter space dump =========
0:  filename: pvfs2://mnt/pvfs2/atchley/blah  ionodes
0:  file size (MB): 1 buffer size 0
0:  vector length: 10 element count: 1 vector count: 0
0:  striping factor: 0 striping size: -1 collective buffer size: 0
0:  loops: 1 displacement 0
0:  ========= Dump done            =========
0:  #* no verification possible!
0: calling noncontigmem_noncontigfile(pvfs2://mnt/pvfs2/atchley/blah, 0x0x2aaaaaaab010, 1048560)
0:
0: # testing noncontiguous in memory, noncontiguous in file using independent I/O
0:  # vector count = 26214 - access count = 26214
0:  calling MPI_File_open(pvfs2://mnt/pvfs2/atchley/blah)
0:  calling MPI_File_set_view()
0:  calling MPI_File_seek()
0:  calling MPI_File_write()
0:  [New Thread 1082132816 (LWP 29290)]
0:  [New Thread 1090525520 (LWP 29291)]
0:
0:  Breakpoint 2, PINT_process_request (req=0x6aea50, mem=0x6aeb00,
0:      rfdata=0x7fffd112b880, result=0x7fffd112b850, mode=2)
0:      at src/io/description/pint-request.c:72
0: 72 void *temp_space = NULL; /* temp copy of req state for size call */
0:  (gdb) 0:  (gdb) bt
0: #0 PINT_process_request (req=0x6aea50, mem=0x6aeb00, rfdata=0x7fffd112b880, 0: result=0x7fffd112b850, mode=2) at src/io/description/pint-request.c:72
0:  #1  0x00000000004844e0 in io_find_target_datafiles (mem_req=0x6ad160,
0: file_req=0x6ae960, file_req_offset=0, dist_p=0x6ae9c0, fs_id=1825963815, 0: io_type=PVFS_IO_WRITE, input_handle_array=0x6b9510, input_handle_count=4,
0:      handle_index_array=0x6b9240, handle_index_out_count=0x7fffd112b944,
0: sio_handle_index_array=0x6aea30, sio_handle_index_count=0x7fffd112b940)
0:      at src/client/sysint/sys-io.sm:2320
0:  #2  0x0000000000480010 in io_datafile_setup_msgpairs (sm_p=0x6ba4a0,
0:      js_p=0x7fffd112b9f0) at src/client/sysint/sys-io.sm:489
0:  #3  0x0000000000476a66 in PINT_state_machine_next (s=0x6ba4a0,
0:      r=0x7fffd112b9f0) at ./src/common/misc/state-machine-fns.h:158
0: #4 0x0000000000476645 in PINT_client_state_machine_post (sm_p=0x6ba4a0,
0:      pvfs_sys_op=6, op_id=0x7fffd112bb30, user_ptr=0x0)
0:      at src/client/sysint/client-state-machine.c:312
0:  #5  0x000000000047f9fc in PVFS_isys_io (ref=
0: {handle = 1048563, fs_id = 1825963815, __pad1 = 0}, file_req=0x6ae960, 0: file_req_offset=0, buffer=0x0, mem_req=0x6ad160, credentials=0x6b8ea0,
0:      resp_p=0x7fffd112bba0, io_type=PVFS_IO_WRITE, op_id=0x7fffd112bb30,
0:      user_ptr=0x0) at src/client/sysint/sys-io.sm:328
0:  #6  0x000000000047facf in PVFS_sys_io (ref=
0: {handle = 1048563, fs_id = 1825963815, __pad1 = 0}, file_req=0x6ae960, 0: file_req_offset=0, buffer=0x0, mem_req=0x6ad160, credentials=0x6b8ea0,
0:      resp_p=0x7fffd112bba0, io_type=PVFS_IO_WRITE)
0:      at src/client/sysint/sys-io.sm:351
0:  #7  0x0000000000458cb2 in ADIOI_PVFS2_WriteStrided (fd=0x6b8d00,
0: buf=0x2aaaaaaab010, count=26214, datatype=-1946157050, file_ptr_type=101,
0:      offset=0, status=0x7fffd112be30, error_code=0x7fffd112bd70)
0: at /nfs/home/atchley/projects/mpich2/mpich2-snap-200706132016/src/mpi/romio/adio/ad_pvfs2/ad_pvfs2_write.c:1001
0:  #8  0x000000000041afcb in MPIOI_File_write (mpi_fh=0x6b8d00, offset=0,
0: file_ptr_type=101, buf=0x2aaaaaaab010, count=26214, datatype=-1946157050,
0:      myname=0x63ac74 "MPI_FILE_WRITE", status=0x7fffd112be30)
0: at /nfs/home/atchley/projects/mpich2/mpich2-snap-200706132016/src/mpi/romio/mpi-io/write.c:156
0:  #9  0x000000000041aafd in PMPI_File_write (mpi_fh=0x6b8d00,
0:      buf=0x2aaaaaaab010, count=26214, datatype=-1946157050,
0:      status=0x7fffd112be30)
0: at /nfs/home/atchley/projects/mpich2/mpich2-snap-200706132016/src/mpi/romio/mpi-io/write.c:52
0:  #10 0x000000000040461e in noncontigmem_noncontigfile (
0: filename=0x668110 "pvfs2://mnt/pvfs2/atchley/blah", buf=0x2aaaaaaab010, 0: bufsize=1048560, dtype=-1946157050, offset=0, displs=0, finfo=-1677721600,
0:      veclen=10, elmtcount=1, veccount=26214) at noncontig.c:185
0:  #11 0x000000000040738d in main (argc=1, argv=0x7fffd112c608)
0:      at noncontig.c:1020
0:  (gdb) s
0: 74 PVFS_offset contig_offset = 0; /* temp for offset of a contig region */
0:  (gdb)
0:  78          if (!PINT_IS_MEMREQ(mode))
0:  (gdb)
0:  79          gossip_debug(GOSSIP_REQUEST_DEBUG,
0:  (gdb)
0: 81 gossip_debug(GOSSIP_REQUEST_DEBUG,"PINT_process_request\n");
0:  (gdb)
0:  83          if (!req)
0:  (gdb)
0:  88          if (!result || !result->segmax || !result->bytemax)
0:  (gdb) p *result
0: $1 = {offset_array = 0x7fffd112b8a8, size_array = 0x7fffd112b8a0, segmax = 1,
0:    segs = 0, bytemax = -1291, bytes = 0}
_______________________________________________
Pvfs2-developers mailing list
[email protected]
http://www.beowulf-underground.org/mailman/listinfo/pvfs2-developers
begin:vcard
fn:Walt Ligon
n:Ligon;Walt
org:Clemson University;ECE Department
adr;dom:;;;Clemson;SC;29634
email;internet:[EMAIL PROTECTED]
title:Associate Professor
tel;work:864-656-1224
x-mozilla-html:FALSE
version:2.1
end:vcard

_______________________________________________
Pvfs2-developers mailing list
[email protected]
http://www.beowulf-underground.org/mailman/listinfo/pvfs2-developers

Reply via email to