Tim,
Thanks for the bug report. I just commit a patch in our development
version (revision 13079). It will go into the 1.2b2 soon, after some
soak time. Until then please use the latest nightly tar (with a
version bigger than 13079) from our website.
Thanks,
george.
On Jan 10, 2007, at 5:19 PM, Tim Campbell wrote:
Greetings,
Attached is a small test fortran program that triggers a failure in
the mpi_waitall. The problem is that the after a couple of calls
to mpi_startall and mpi_waitall some of the mpi_requests become
corrupted. This causes the next call to mpi_startall to fail.
Here is output from a 2 cpu run.
[44]% mpif90 -g test_ompi.f
[45]% mpirun -np 2 a.out
TEST(A): 0 1 | 2 3 4 5
TEST(B): 0 1 | 2 3 4 5
OUTPUT: 0 1 | 100 100 101 101
TEST(A): 0 2 | 2 3 4 5
TEST(B): 0 2 | -32766 -32766 4 5
OUTPUT: 0 2 | 200 200 201 201
TEST(A): 1 1 | 2 3 4 5
TEST(B): 1 1 | 2 3 4 5
OUTPUT: 1 1 | 101 101 100 100
TEST(A): 1 2 | 2 3 4 5
TEST(B): 1 2 | -32766 -32766 4 5
OUTPUT: 1 2 | 201 201 200 200
^Cmpirun: killing job...
The "-32766" values show up in the mpi_request array after the
second call to mpi_waitall. Using prints in the OpenMPI code I
have tracked the problem to
ompi/request/req_wait.c:ompi_request_wait_all().
I find upon entry to ompi_request_wait_all() that the values of
request[:]->req_f_to_c_index are valid. However, upon exit of
ompi_request_wait_all() the first two entries of request[:]-
>req_f_to_c_index have the value of -32766.
I am testing with OpenMPI version 1.2b2. This problem occurs on
both x86_64 and Intel i386 and it occurs for both Portland Group
compilers and for GCC/G95.
Cheers,
Tim Campbell
Naval Research Laboratory
<test_ompi.f.gz>
_______________________________________________
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users