On Sat, Feb 17, 2007 at 02:32:13PM +0100, Michael Kuhn wrote: > The program basically writes and reads data using combinations of > (non-)collective and (non-)contiguous I/O. It seems this error only > occurs if we do non-collective, contiguous I/O (level 0) with multiple > processes. Less processes and other levels work just fine. (The number > of iterations our program does also seems to play a role; 2 iterations > work, 3 produce the error.)
With the test case this was easy to track down (thanks again!), but it's proving harder to come up with a solid fix. The problem is that I do not correctly handle incrementing the independent file pointer in our PVFS driver with the kinds of types you are passing in and multiple calls to MPI-IO independent file pointer routines. Because we messed up type handling, data would be corrupted on the second call to MPI_File_write, and as Pete diagnosed, we'd start running off the end of our typmap array at the third iteration. In the short term, if you can do all your I/O in a sigle call to MPI_File_write, you'll avoid this bug. In the longer term, I'll send you a patch, but I'm going to be on travel thursday and friday. I hope I can put something together early next week. ==rob -- Rob Latham Mathematics and Computer Science Division A215 0178 EA2D B059 8CDF Argonne National Lab, IL USA B29D F333 664A 4280 315B _______________________________________________ Pvfs2-users mailing list [email protected] http://www.beowulf-underground.org/mailman/listinfo/pvfs2-users
