I've built and sucessfully run the Nasa Overflow 2.0aa program with Openmpi 1.0.2. I'm running on an opteron linux cluster running SLES 9 and GM 2.0.24. I built Openmpi 1.1 with the intel 9 compilers and try to run Overflow 2.0aa with myrinet, it get what looks like a data corruption error and the program dies quickly. There are no mpi errors at all.If I run using GIGE (--mca btl self,tcp), the program runs to competion correctly. Here is my ompi_info output :
bsb3227@mahler:~/openmpi_1.1/bin> ./ompi_info Open MPI: 1.1 Open MPI SVN revision: r10477 Open RTE: 1.1 Open RTE SVN revision: r10477 OPAL: 1.1 OPAL SVN revision: r10477 Prefix: /home/bsb3227/openmpi_1.1 Configured architecture: x86_64-unknown-linux-gnu Configured by: bsb3227 Configured on: Fri Jun 30 07:08:54 PDT 2006 Configure host: mahler Built by: bsb3227 Built on: Fri Jun 30 07:54:46 PDT 2006 Built host: mahler C bindings: yes C++ bindings: yes Fortran77 bindings: yes (all) Fortran90 bindings: yes Fortran90 bindings size: small C compiler: icc C compiler absolute: /opt/intel/cce/9.0.25/bin/icc C++ compiler: icpc C++ compiler absolute: /opt/intel/cce/9.0.25/bin/icpc Fortran77 compiler: ifort Fortran77 compiler abs: /opt/intel/fce/9.0.25/bin/ifort Fortran90 compiler: /opt/intel/fce/9.0.25/bin/ifort Fortran90 compiler abs: /opt/intel/fce/9.0.25/bin/ifort C profiling: yes C++ profiling: yes Fortran77 profiling: yes Fortran90 profiling: yes C++ exceptions: no Thread support: posix (mpi: no, progress: no) Internal debug support: no MPI parameter check: runtime Memory profiling support: no Memory debugging support: no libltdl support: yes MCA memory: ptmalloc2 (MCA v1.0, API v1.0, Component v1.1) MCA paffinity: linux (MCA v1.0, API v1.0, Component v1.1) MCA maffinity: first_use (MCA v1.0, API v1.0, Component v1.1) MCA maffinity: libnuma (MCA v1.0, API v1.0, Component v1.1) MCA timer: linux (MCA v1.0, API v1.0, Component v1.1) MCA allocator: basic (MCA v1.0, API v1.0, Component v1.0) MCA allocator: bucket (MCA v1.0, API v1.0, Component v1.0) MCA coll: basic (MCA v1.0, API v1.0, Component v1.1) MCA coll: hierarch (MCA v1.0, API v1.0, Component v1.1) MCA coll: self (MCA v1.0, API v1.0, Component v1.1) MCA coll: sm (MCA v1.0, API v1.0, Component v1.1) MCA coll: tuned (MCA v1.0, API v1.0, Component v1.1) MCA io: romio (MCA v1.0, API v1.0, Component v1.1) MCA mpool: sm (MCA v1.0, API v1.0, Component v1.1) MCA mpool: gm (MCA v1.0, API v1.0, Component v1.1) MCA pml: ob1 (MCA v1.0, API v1.0, Component v1.1) MCA bml: r2 (MCA v1.0, API v1.0, Component v1.1) MCA rcache: rb (MCA v1.0, API v1.0, Component v1.1) MCA btl: self (MCA v1.0, API v1.0, Component v1.1) MCA btl: sm (MCA v1.0, API v1.0, Component v1.1) MCA btl: gm (MCA v1.0, API v1.0, Component v1.1) MCA btl: tcp (MCA v1.0, API v1.0, Component v1.0) MCA topo: unity (MCA v1.0, API v1.0, Component v1.1) MCA osc: pt2pt (MCA v1.0, API v1.0, Component v1.0) MCA gpr: null (MCA v1.0, API v1.0, Component v1.1) MCA gpr: proxy (MCA v1.0, API v1.0, Component v1.1) MCA gpr: replica (MCA v1.0, API v1.0, Component v1.1) MCA iof: proxy (MCA v1.0, API v1.0, Component v1.1) MCA iof: svc (MCA v1.0, API v1.0, Component v1.1) MCA ns: proxy (MCA v1.0, API v1.0, Component v1.1) MCA ns: replica (MCA v1.0, API v1.0, Component v1.1) MCA oob: tcp (MCA v1.0, API v1.0, Component v1.0) MCA ras: dash_host (MCA v1.0, API v1.0, Component v1.1) MCA ras: hostfile (MCA v1.0, API v1.0, Component v1.1) MCA ras: localhost (MCA v1.0, API v1.0, Component v1.1) MCA ras: slurm (MCA v1.0, API v1.0, Component v1.1) MCA ras: tm (MCA v1.0, API v1.0, Component v1.1) MCA rds: hostfile (MCA v1.0, API v1.0, Component v1.1) MCA rds: resfile (MCA v1.0, API v1.0, Component v1.1) MCA rmaps: round_robin (MCA v1.0, API v1.0, Component v1.1) MCA rmgr: proxy (MCA v1.0, API v1.0, Component v1.1) MCA rmgr: urm (MCA v1.0, API v1.0, Component v1.1) MCA rml: oob (MCA v1.0, API v1.0, Component v1.1) MCA pls: fork (MCA v1.0, API v1.0, Component v1.1) MCA pls: rsh (MCA v1.0, API v1.0, Component v1.1) MCA pls: slurm (MCA v1.0, API v1.0, Component v1.1) MCA pls: tm (MCA v1.0, API v1.0, Component v1.1) MCA sds: env (MCA v1.0, API v1.0, Component v1.1) MCA sds: seed (MCA v1.0, API v1.0, Component v1.1) MCA sds: singleton (MCA v1.0, API v1.0, Component v1.1) MCA sds: pipe (MCA v1.0, API v1.0, Component v1.1) MCA sds: slurm (MCA v1.0, API v1.0, Component v1.1) Here is the ifconfig for one of the nodes : bsb3227@m045:~> /sbin/ifconfig eth0 Link encap:Ethernet HWaddr 00:50:45:5D:CD:FE inet addr:10.241.194.45 Bcast:10.241.195.255 Mask:255.255.254.0 inet6 addr: fe80::250:45ff:fe5d:cdfe/64 Scope:Link UP BROADCAST NOTRAILERS RUNNING MULTICAST MTU:1500 Metric:1 RX packets:39913407 errors:0 dropped:0 overruns:0 frame:0 TX packets:48794587 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:1000 RX bytes:31847343907 (30371.9 Mb) TX bytes:48231713866 (45997.3 Mb) Interrupt:19 eth1 Link encap:Ethernet HWaddr 00:50:45:5D:CD:FF inet6 addr: fe80::250:45ff:fe5d:cdff/64 Scope:Link UP BROADCAST MULTICAST MTU:1500 Metric:1 RX packets:0 errors:0 dropped:0 overruns:0 frame:0 TX packets:0 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:1000 RX bytes:0 (0.0 b) TX bytes:0 (0.0 b) Interrupt:19 lo Link encap:Local Loopback inet addr:127.0.0.1 Mask:255.0.0.0 inet6 addr: ::1/128 Scope:Host UP LOOPBACK RUNNING MTU:16436 Metric:1 RX packets:23141 errors:0 dropped:0 overruns:0 frame:0 TX packets:23141 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:0 RX bytes:20145689 (19.2 Mb) TX bytes:20145689 (19.2 Mb) I hope someone can give me some guidance on how to debug this problem. Thanx in advance for any help that can be provided. Bernie Borenstein The Boeing Company
config.log.gz
Description: config.log.gz