On some of my larger problems ,
my program stalls and does not continue
(50 or more nodes, 'long' runs >5 hours). My program is set up as a
master-worker
and it seems that the master gets stuck in a write to stdout see gdb
backtrace below (It took all day
to get there on 50 nodes). the function handle_message is simply
printing to the stdout in this case.
Of course the workers keep sending stuff to the master, but the
master is stuck
writing that does not finish. Any idea where to look next?
[smaller runs look fine, valgrind did not find problems in my code
(complaining a lot about openmpi so)
I attach also the ompi_info to show versions (OS is macos 10.5.5)
any idea what is going on? [any hint is welcome!]
thanks
Peter
(gdb) bt
#0 0x00000037528c0e50 in __write_nocancel () from /lib64/libc.so.6
#1 0x00000037528694b3 in _IO_new_file_write () from /lib64/libc.so.6
#2 0x00000037528693c6 in _IO_new_do_write () from /lib64/libc.so.6
#3 0x000000375286a822 in _IO_new_file_xsputn () from /lib64/libc.so.6
#4 0x000000375285f4f8 in fputs () from /lib64/libc.so.6
#5 0x000000000045e9de in handle_message (
rawmessage=0x4bb8830 "M0:[ 12] Swapping between 4 temperatures.
\n", ' ' <repeats 11 times>, "Temperature | Accepted | Swaps
between temperatures\n", ' ' <repeats 16 times>, "1e+06 | 0.00
| |\n", ' ' <repeats 15 times>, "3.0000 | 0.08
| 1 ||"..., sender=12, world=0x448d8b0)
at migrate_mpi.c:3663
#6 0x000000000045362a in mpi_runloci_master (loci=1, who=0x4541fc0,
world=0x448d8b0, options_readsum=0, menu=0) at migrate_mpi.c:228
#7 0x000000000044ed86 in run_sampler (options=0x448dc20,
data=0x4465a10,
universe=0x42b90c0, usize=4, outfilepos=0x7fff0ff98ee0,
Gmax=0x7fff0ff98ee8) at main.c:885
#8 0x000000000044dff2 in main (argc=3, argv=0x7fff0ff99008) at
main.c:422
petal:~>ompi_info
Open MPI: 1.2.8
Open MPI SVN revision: r19718
Open RTE: 1.2.8
Open RTE SVN revision: r19718
OPAL: 1.2.8
OPAL SVN revision: r19718
Prefix: /home/beerli/openmpi
Configured architecture: x86_64-unknown-linux-gnu
Configured by: beerli
Configured on: Mon Nov 3 15:00:02 EST 2008
Configure host: petal
Built by: beerli
Built on: Mon Nov 3 15:08:02 EST 2008
Built host: petal
C bindings: yes
C++ bindings: yes
Fortran77 bindings: yes (all)
Fortran90 bindings: yes
Fortran90 bindings size: small
C compiler: gcc
C compiler absolute: /usr/bin/gcc
C++ compiler: g++
C++ compiler absolute: /usr/bin/g++
Fortran77 compiler: gfortran
Fortran77 compiler abs: /usr/bin/gfortran
Fortran90 compiler: gfortran
Fortran90 compiler abs: /usr/bin/gfortran
C profiling: yes
C++ profiling: yes
Fortran77 profiling: yes
Fortran90 profiling: yes
C++ exceptions: no
Thread support: posix (mpi: no, progress: no)
Internal debug support: no
MPI parameter check: runtime
Memory profiling support: no
Memory debugging support: no
libltdl support: yes
Heterogeneous support: yes
mpirun default --prefix: no
MCA backtrace: execinfo (MCA v1.0, API v1.0, Component
v1.2.8)
MCA memory: ptmalloc2 (MCA v1.0, API v1.0, Component
v1.2.8)
MCA paffinity: linux (MCA v1.0, API v1.0, Component v1.2.8)
MCA maffinity: first_use (MCA v1.0, API v1.0, Component
v1.2.8)
MCA timer: linux (MCA v1.0, API v1.0, Component v1.2.8)
MCA installdirs: env (MCA v1.0, API v1.0, Component v1.2.8)
MCA installdirs: config (MCA v1.0, API v1.0, Component v1.2.8)
MCA allocator: basic (MCA v1.0, API v1.0, Component v1.0)
MCA allocator: bucket (MCA v1.0, API v1.0, Component v1.0)
MCA coll: basic (MCA v1.0, API v1.0, Component v1.2.8)
MCA coll: self (MCA v1.0, API v1.0, Component v1.2.8)
MCA coll: sm (MCA v1.0, API v1.0, Component v1.2.8)
MCA coll: tuned (MCA v1.0, API v1.0, Component v1.2.8)
MCA io: romio (MCA v1.0, API v1.0, Component v1.2.8)
MCA mpool: rdma (MCA v1.0, API v1.0, Component v1.2.8)
MCA mpool: sm (MCA v1.0, API v1.0, Component v1.2.8)
MCA pml: cm (MCA v1.0, API v1.0, Component v1.2.8)
MCA pml: ob1 (MCA v1.0, API v1.0, Component v1.2.8)
MCA bml: r2 (MCA v1.0, API v1.0, Component v1.2.8)
MCA rcache: vma (MCA v1.0, API v1.0, Component v1.2.8)
MCA btl: self (MCA v1.0, API v1.0.1, Component v1.2.8)
MCA btl: sm (MCA v1.0, API v1.0.1, Component v1.2.8)
MCA btl: tcp (MCA v1.0, API v1.0.1, Component v1.0)
MCA topo: unity (MCA v1.0, API v1.0, Component v1.2.8)
MCA osc: pt2pt (MCA v1.0, API v1.0, Component v1.2.8)
MCA errmgr: hnp (MCA v1.0, API v1.3, Component v1.2.8)
MCA errmgr: orted (MCA v1.0, API v1.3, Component v1.2.8)
MCA errmgr: proxy (MCA v1.0, API v1.3, Component v1.2.8)
MCA gpr: null (MCA v1.0, API v1.0, Component v1.2.8)
MCA gpr: proxy (MCA v1.0, API v1.0, Component v1.2.8)
MCA gpr: replica (MCA v1.0, API v1.0, Component v1.2.8)
MCA iof: proxy (MCA v1.0, API v1.0, Component v1.2.8)
MCA iof: svc (MCA v1.0, API v1.0, Component v1.2.8)
MCA ns: proxy (MCA v1.0, API v2.0, Component v1.2.8)
MCA ns: replica (MCA v1.0, API v2.0, Component v1.2.8)
MCA oob: tcp (MCA v1.0, API v1.0, Component v1.0)
MCA ras: dash_host (MCA v1.0, API v1.3, Component
v1.2.8)
MCA ras: gridengine (MCA v1.0, API v1.3, Component
v1.2.8)
MCA ras: localhost (MCA v1.0, API v1.3, Component
v1.2.8)
MCA ras: slurm (MCA v1.0, API v1.3, Component v1.2.8)
MCA rds: hostfile (MCA v1.0, API v1.3, Component
v1.2.8)
MCA rds: proxy (MCA v1.0, API v1.3, Component v1.2.8)
MCA rds: resfile (MCA v1.0, API v1.3, Component v1.2.8)
MCA rmaps: round_robin (MCA v1.0, API v1.3, Component
v1.2.8)
MCA rmgr: proxy (MCA v1.0, API v2.0, Component v1.2.8)
MCA rmgr: urm (MCA v1.0, API v2.0, Component v1.2.8)
MCA rml: oob (MCA v1.0, API v1.0, Component v1.2.8)
MCA pls: gridengine (MCA v1.0, API v1.3, Component
v1.2.8)
MCA pls: proxy (MCA v1.0, API v1.3, Component v1.2.8)
MCA pls: rsh (MCA v1.0, API v1.3, Component v1.2.8)
MCA pls: slurm (MCA v1.0, API v1.3, Component v1.2.8)
MCA sds: env (MCA v1.0, API v1.0, Component v1.2.8)
MCA sds: pipe (MCA v1.0, API v1.0, Component v1.2.8)
MCA sds: seed (MCA v1.0, API v1.0, Component v1.2.8)
MCA sds: singleton (MCA v1.0, API v1.0, Component
v1.2.8)
MCA sds: slurm (MCA v1.0, API v1.0, Component v1.2.8)
_______________________________________________
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users