Re: [OMPI users] MPI_Allreduce hangs

2012-07-02 Thread Jeff Squyres
On Jun 27, 2012, at 6:32 PM, Martin Siegert wrote: > However, there is another issue that may affect the performance of the 1.6.1 > version. I see a LOT of the following messages on stderr: > > -- > The OpenFabrics (openib) B

Re: [OMPI users] MPI_Allreduce hangs

2012-06-27 Thread Martin Siegert
On Wed, Jun 27, 2012 at 02:30:11PM -0400, Jeff Squyres wrote: > On Jun 27, 2012, at 2:25 PM, Martin Siegert wrote: > > >> http://www.open-mpi.org/~jsquyres/unofficial/openmpi-1.6.1ticket3131r26612M.tar.bz2 > > > > Thanks! I tried this and, indeed, the program (I tested quantum espresso, > > pw.x,

Re: [OMPI users] MPI_Allreduce hangs

2012-06-27 Thread Jeff Squyres
On Jun 27, 2012, at 2:25 PM, Martin Siegert wrote: >> http://www.open-mpi.org/~jsquyres/unofficial/openmpi-1.6.1ticket3131r26612M.tar.bz2 > > Thanks! I tried this and, indeed, the program (I tested quantum espresso, > pw.x, so far) no longer hangs. Good! We're doing a bit more definitive testin

Re: [OMPI users] MPI_Allreduce hangs

2012-06-27 Thread Martin Siegert
Hi Jeff, On Wed, Jun 20, 2012 at 04:16:12PM -0400, Jeff Squyres wrote: > On Jun 20, 2012, at 3:36 PM, Martin Siegert wrote: > > > by now we know of three programs - dirac, wrf, quantum espresso - that > > all hang with openmpi-1.4.x (have not yet checked with openmpi-1.6). > > All of these progra

Re: [OMPI users] MPI_Allreduce hangs

2012-06-20 Thread Jeff Squyres
On Jun 20, 2012, at 3:36 PM, Martin Siegert wrote: > by now we know of three programs - dirac, wrf, quantum espresso - that > all hang with openmpi-1.4.x (have not yet checked with openmpi-1.6). > All of these programs run to completion with the mpiexec commandline > argument: --mca btl_openib_fla

Re: [OMPI users] MPI_Allreduce hangs

2012-06-20 Thread Martin Siegert
Hi, by now we know of three programs - dirac, wrf, quantum espresso - that all hang with openmpi-1.4.x (have not yet checked with openmpi-1.6). All of these programs run to completion with the mpiexec commandline argument: --mca btl_openib_flags 305 We now set this in the global configuration file

Re: [OMPI users] MPI_Allreduce hangs

2012-05-04 Thread Jorge Chiva Segura
Hello, I think that my problem: http://www.open-mpi.org/community/lists/users/2012/05/19182.php is similar to yours. Following the advice in the thread that you posted: http://www.open-mpi.org/community/lists/users/2011/07/16996.php I have tried to run my program adding: -mca btl_openib_flags 305

Re: [OMPI users] MPI_Allreduce hangs

2012-05-04 Thread Martin Siegert
On Tue, Apr 24, 2012 at 04:19:31PM -0400, Brock Palen wrote: > To throw in my $0.02, though it is worth less. > > Were you running this on verb based infiniband? Correct. > We see a problem that we have a work around for even with the newest 1.4.5 > only on IB, we can reproduce it with IMB. I

Re: [OMPI users] MPI_Allreduce hangs

2012-04-24 Thread Brock Palen
To throw in my $0.02, though it is worth less. Were you running this on verb based infiniband? We see a problem that we have a work around for even with the newest 1.4.5 only on IB, we can reproduce it with IMB. You can find an old thread from me about it. Your problem might not be the same.

Re: [OMPI users] MPI_Allreduce hangs

2012-04-24 Thread Jeffrey Squyres
Could you repeat your tests with 1.4.5 and/or 1.5.5? On Apr 23, 2012, at 1:32 PM, Martin Siegert wrote: > Hi, > > I am debugging a program that hangs in MPI_Allreduce (openmpi-1.4.3). > An strace of one of the processes shows: > > Process 10925 attached with 3 threads - interrupt to quit > [pi

Re: [OMPI users] MPI_Allreduce hangs

2012-04-23 Thread Gus Correa
Hi Martin Not sure this solution will help with your problem, but a workaround for situations where the count number exceeds the maximum 32-bit positive integer is to declare a user defined type, say MPI_Type_Contiguous or MPI_Type_Vector, large enough to aggregate a bunch of your original data (

[OMPI users] MPI_Allreduce hangs

2012-04-23 Thread Martin Siegert
Hi, I am debugging a program that hangs in MPI_Allreduce (openmpi-1.4.3). An strace of one of the processes shows: Process 10925 attached with 3 threads - interrupt to quit [pid 10927] poll([{fd=17, events=POLLIN}, {fd=16, events=POLLIN}], 2, -1 [pid 10926] select(15, [8 14], [], NULL, NULL [pi