Open MPI 1.3.4 is pretty ancient.  Can you upgrade to Open MPI 1.6?

On Jun 27, 2012, at 2:59 PM, William Au wrote:

> Hi, 
> 
> When I ran multiple processes in a single machine, the programs are hanging 
> in mpi_allreduce in different points 
> during different runs. I am using 1.3.4. When I used different machines to 
> run the processes, it is OK. Also, when 
> I recompiled open mpi in debug mode, the problem goes away. Since the 
> hangings occurred at different points, 
> I suspect some race/deadlock situations that due to some optimization in 
> openmpi. I used -O3 in compilation with 
> gcc44 and gfortran44. The software I am running in MUMPS (4.10.0). Other 
> platforms (solaris 10) do not  have 
> this problem. Any suggestion I should try out? 
> 
> 
> Here is stack:
> 
> #0  mca_btl_sm_component_progress () at btl_sm_component.c:387
> #1  0x00002b304a4e1f3a in opal_progress () at runtime/opal_progress.c:207
> #2  0x00002b3049e20fa5 in opal_condition_wait (count=2, 
> requests=0x7fff1376d850, statuses=0x0)
>     at ../opal/threads/condition.h:99
> #3  ompi_request_default_wait_all (count=2, requests=0x7fff1376d850, 
> statuses=0x0)
>     at request/req_wait.c:262
> #4  0x00002b304ecb4952 in ompi_coll_tuned_allreduce_intra_recursivedoubling (
>     sbuf=<value optimized out>, rbuf=0x14c9da10, count=1, 
> dtype=0x2b304a085d40, op=0x2b304a07d280,
>     comm=0x14ca34d0, module=0x14ca0500) at coll_tuned_allreduce.c:223
> #5  0x00002b3049e36384 in PMPI_Allreduce (sendbuf=0x14c9d8d0, 
> recvbuf=0x14c9da10, count=1,
>     datatype=<value optimized out>, op=0x2b304a07d280, comm=0x14ca34d0) at 
> pallreduce.c:102! 
> #6  0x00002b304a0b9bd3 in mpi_allreduce_f (sendbuf=0x14c9d8d0 "", 
> recvbuf=0x14c9da10 "",
>     count=0x626eb0, datatype=<value optimized out>, op=0x626ec0, comm=<value 
> optimized out>,
>     ierr=0x7fff1376e530) at pallreduce_f.c:77
> #7  0x000000000049dbd4 in dmumps_142 (id=...) at dmumps_part5.F:5570
> 
> 
> Thanks. 
> 
> Willia
> _______________________________________________
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users


-- 
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to: 
http://www.cisco.com/web/about/doing_business/legal/cri/


Reply via email to