Sorry for the delay in replying.

I think you need to use MPI_INIT_THREAD with a level of MPI_THREAD_MULTIPLE 
instead of MPI_INIT.  This sets up internal locking in Open MPI to protect 
against multiple threads inside the progress engine, etc.

Be aware that only some of Open MPI's transports are THREAD_MULTIPLE safe -- 
see the README for more detail.


On Oct 23, 2011, at 1:11 PM, Pedro Gonnet wrote:

> 
> Hi again,
> 
> As promised, I implemented a small program reproducing the error.
> 
> The program's main routine spawns a pthread which calls the function
> "exchange". "exchange" uses MPI_Isend/MPI_Irecv/MPI_Waitany to exchange
> a buffer of double-precision numbers with all other nodes.
> 
> At the same time, the "main" routine exchanges the sum of all the
> buffers using MPI_Allreduce.
> 
> To compile and run the program, do the following:
> 
>        mpicc -g -Wall mpitest.c -pthread
>        mpirun -np 8 ./a.out
> 
> Timing is, of course, of the essence and you may have to run the program
> a few times or twiddle with the value of "usleep" in line 146 for it to
> hang. To see where things go bad, you can do the following
> 
>        mpirun -np 8 xterm -e gdb -ex run ./a.out
> 
> Things go bad when MPI_Allreduce is called while any of the threads are
> in MPI_Waitany. The value of "usleep" in line 146 should be long enough
> for all the nodes to have started exchanging data but small enough so
> that they are not done yet.
> 
> Cheers,
> Pedro
> 
> 
> 
> On Thu, 2011-10-20 at 11:25 +0100, Pedro Gonnet wrote:
>> Short update:
>> 
>> I just installed version 1.4.4 from source (compiled with
>> --enable-mpi-threads), and the problem persists.
>> 
>> I should also point out that if, in thread (ii), I wait for the
>> nonblocking communication in thread (i) to finish, nothing bad happens.
>> But this makes the nonblocking communication somewhat pointless.
>> 
>> Cheers,
>> Pedro
>> 
>> 
>> On Thu, 2011-10-20 at 10:42 +0100, Pedro Gonnet wrote:
>>> Hi all,
>>> 
>>> I am currently working on a multi-threaded hybrid parallel simulation
>>> which uses both pthreads and OpenMPI. The simulation uses several
>>> pthreads per MPI node.
>>> 
>>> My code uses the nonblocking routines MPI_Isend/MPI_Irecv/MPI_Waitany
>>> quite successfully to implement the node-to-node communication. When I
>>> try to interleave other computations during this communication, however,
>>> bad things happen.
>>> 
>>> I have two MPI nodes with two threads each: one thread (i) doing the
>>> nonblocking communication and the other (ii) doing other computations.
>>> At some point, the threads (ii) need to exchange data using
>>> MPI_Allreduce, which fails if the first thread (i) has not completed all
>>> the communication, i.e. if thread (i) is still in MPI_Waitany.
>>> 
>>> Using the in-place MPI_Allreduce, I get a re-run of this bug:
>>> http://www.open-mpi.org/community/lists/users/2011/09/17432.php. If I
>>> don't use in-place, the call to MPI_Waitany (thread ii) on one of the
>>> MPI nodes waits forever. 
>>> 
>>> My guess is that when the thread (ii) calls MPI_Allreduce, it gets
>>> whatever the other node sent with MPI_Isend to thread (i), drops
>>> whatever it should have been getting from the other node's
>>> MPI_Allreduce, and the call to MPI_Waitall hangs.
>>> 
>>> Is this a known issue? Is MPI_Allreduce not designed to work alongside
>>> the nonblocking routines? Is there a "safe" variant of MPI_Allreduce I
>>> should be using instead?
>>> 
>>> I am using OpenMPI version 1.4.3 (version 1.4.3-1ubuntu3 of the package
>>> openmpi-bin in Ubuntu). Both MPI nodes are run on the same dual-core
>>> computer (Lenovo x201 laptop).
>>> 
>>> If you need more information, please do let me know! I'll also try to
>>> cook-up a small program reproducing this problem...
>>> 
>>> Cheers and kind regards,
>>> Pedro
>>> 
>>> 
>>> 
>>> 
>> 
> 
> <mpitest.c>_______________________________________________
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users


-- 
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to:
http://www.cisco.com/web/about/doing_business/legal/cri/


Reply via email to