On Sat, Feb 17, 2007 at 02:32:13PM +0100, Michael Kuhn wrote:
> The program basically writes and reads data using combinations of
> (non-)collective and (non-)contiguous I/O. It seems this error only
> occurs if we do non-collective, contiguous I/O (level 0) with multiple
> processes. Less processes and other levels work just fine. (The number
> of iterations our program does also seems to play a role; 2 iterations
> work, 3 produce the error.)

With the test case this was easy to track down (thanks again!), but
it's proving harder to come up with a solid fix. 

The problem is that I do not correctly handle incrementing the
independent file pointer in our PVFS driver with the kinds of types
you are passing in and multiple calls to MPI-IO independent file
pointer routines.

Because we messed up type handling, data would be corrupted on the
second call to MPI_File_write, and as Pete diagnosed, we'd start
running off the end of our typmap array at the third iteration.

In the short term, if you can do all your I/O in a sigle call to
MPI_File_write, you'll avoid this bug.  In the longer term, I'll send
you a patch, but I'm going to be on travel thursday and friday.  I
hope I can put something together early next week.

==rob

-- 
Rob Latham
Mathematics and Computer Science Division    A215 0178 EA2D B059 8CDF
Argonne National Lab, IL USA                 B29D F333 664A 4280 315B
_______________________________________________
Pvfs2-users mailing list
[email protected]
http://www.beowulf-underground.org/mailman/listinfo/pvfs2-users

Reply via email to