Re: [OMPI users] Several Bcast calls in a loop causing the code to hang

2015-02-23 Thread George Bosilca
On Mon, Feb 23, 2015 at 4:37 PM, Joshua Ladd wrote: > Nathan, > > I do, but the hang comes later on. It looks like it's a situation where > the root is way, way faster than the children and he's inducing an an > overrun in the unexpected message queue. I think the queue is

Re: [OMPI users] Several Bcast calls in a loop causing the code to hang

2015-02-23 Thread Joshua Ladd
Nathan, I do, but the hang comes later on. It looks like it's a situation where the root is way, way faster than the children and he's inducing an an overrun in the unexpected message queue. I think the queue is set to just keep growing and it eventually blows up the memory??

Re: [OMPI users] Several Bcast calls in a loop causing the code to hang

2015-02-23 Thread Nathan Hjelm
Josh, do you see a hang when using vader? It is preferred over the old sm btl. -Nathan On Mon, Feb 23, 2015 at 03:48:17PM -0500, Joshua Ladd wrote: >Sachin, > >I am able to reproduce something funny. Looks like your issue. When I run >on a single host with two ranks, the test works

Re: [OMPI users] Several Bcast calls in a loop causing the code to hang

2015-02-23 Thread Joshua Ladd
Sachin, I am able to reproduce something funny. Looks like your issue. When I run on a single host with two ranks, the test works fine. However, when I try three or more, it looks like only the root, rank 0, is making any progress after the first iteration.

Re: [OMPI users] Several Bcast calls in a loop causing the code to hang

2015-02-22 Thread Sachin Krishnan
George, I was able to run the code without any errors in an older version of OpenMPI in another machine. It looks like some problem with my machine like Josh pointed out. Adding --mca coll tuned or basic to the mpirun command resulted in an MPI_Init failed error with the following additional

Re: [OMPI users] Several Bcast calls in a loop causing the code to hang

2015-02-22 Thread George Bosilca
Sachin, I cant replicate your issue neither with the latest 1.8 nor with the trunk. I tried using a single host, while forcing SM and then TP to no avail. Can you try restricting the collective modules in use (adding --mca coll tuned,basic) to your mpirun command? George. On Fri, Feb 20,

Re: [OMPI users] Several Bcast calls in a loop causing the code to hang

2015-02-20 Thread Sachin Krishnan
Josh, Thanks for the help. I'm running on a single host. How do I confirm that it is an issue with the shared memory? Sachin On Fri, Feb 20, 2015 at 11:58 PM, Joshua Ladd wrote: > Sachin, > > Are you running this on a single host or across multiple hosts (i.e. are > you

Re: [OMPI users] Several Bcast calls in a loop causing the code to hang

2015-02-20 Thread Joshua Ladd
Sachin, Are you running this on a single host or across multiple hosts (i.e. are you communicating between processes via networking.) If it's on a single host, then it might be an issue with shared memory. Josh On Fri, Feb 20, 2015 at 1:51 AM, Sachin Krishnan wrote: >

Re: [OMPI users] Several Bcast calls in a loop causing the code to hang

2015-02-20 Thread Sachin Krishnan
Hello Josh, The command i use to compile the code is: mpicc bcast_loop.c To run the code I use: mpirun -np 2 ./a.out Output is unpredictable. It gets stuck at different places. Im attaching lstopo and ompi_info outputs. Do you need any other info? lstopo-no-graphics output: Machine

Re: [OMPI users] Several Bcast calls in a loop causing the code to hang

2015-02-18 Thread Joshua Ladd
Sachin, Can you, please, provide a command line? Additional information about your system could be helpful also. Josh On Wed, Feb 18, 2015 at 3:43 AM, Sachin Krishnan wrote: > Hello, > > I am new to MPI and also this list. > I wrote an MPI code with several MPI_Bcast calls