Thanks. I had also posted the bug on the MPICH2 list, and received an aswer from the ROMIO maintainers: the issue seems to be related to NFS file locking bugs. I had been testing on an NFS system, and when I re-tested under a local (ext3) file system, I did not reproduce the bug.
I had been experimenting with the MPI-IO using explicit offsets, individual pointers, and shared pointers, and have workarounds, so I'll just avoid shared pointers on NFS. Best regards, Yvan Fournier EDF R&D On Sat, 2008-08-16 at 08:19 -0400, users-requ...@open-mpi.org wrote: > Date: Sat, 16 Aug 2008 08:05:14 -0400 > From: Jeff Squyres <jsquy...@cisco.com> > Subject: Re: [OMPI users] bug in MPI_File_get_position_shared ? > To: Open MPI Users <us...@open-mpi.org> > Cc: mpich2-ma...@mcs.anl.gov > Message-ID: <023f1db0-8e8d-4c8c-8156-80ae52ff0...@cisco.com> > Content-Type: text/plain; charset=US-ASCII; format=flowed; delsp=yes > > On Aug 13, 2008, at 7:06 PM, Yvan Fournier wrote: > > > I seem to have encountered a bug in MPI-IO, in which > > MPI_File_get_position_shared hangs when called by multiple processes > > in > > a communicator. It can be illustrated by the following simple test > > case, > > in which a file is simply created with C IO, and opened with MPI-IO. > > (defining or undefining MY_MPI_IO_BUG on line 5 enables/disables the > > bug). From the MPI2 documentation, It seems that all processes > > should be > > able to call MPI_File_get_position_shared, but if more than one > > process > > uses it, it fails. Setting the shared pointer helps, but this should > > not > > be necessary, and the code still hangs (in more complete code, after > > writing data). > > > > I encounter the same problem with Open MPI 1.2.6 and MPICH2 1.0.7, so > > I may have misread the documentation, but I suspect a ROMIO bug. > > Bummer. :-( > > It would be best to report this directly to the ROMIO maintainers via > romio-ma...@mcs.anl.gov > . They lurk on this list, but they may not be paying attention to > every mail. > > If you wouldn't mind, please CC me on the mail to romio-maint. Thanks! > > -- > Jeff Squyres > Cisco Systems