I built openib-1.2.6 on centos-5.2 with gcc-4.3.1.
I did a tar xvzf, cd openib-1.2.6, mkdir obj, cd obj:
(I put gcc-4.3.1/bin first in my path)
../configure --prefix=/opt/pkg/openmpi-1.2.6 --enable-shared --enable-debug
If I look in config.log I see:
MCA_btl_ALL_COMPONENTS=' self sm gm mvapi mx openib portals tcp udapl'
MCA_btl_DSO_COMPONENTS=' self sm openib tcp'
So both openib and tcp are available and have many parameters under
ompi_info --param btl tcp
ompi_info --param btl openib
Yet, when I run a MPI program I can't get use TCP:
# which mpirun
/opt/pkg/openmpi-1.2.6/bin/mpirun
# mpirun -mca btl ^openib -np 2 -machinefile m ./relay 1
compute-0-1.local compute-0-0.local
size= 1, 131072 hops, 2 nodes in 0.304 sec ( 2.320 us/hop) 1683 KB/sec
Or if I try the inverse:
# mpirun -mca btl self,tcp -np 2 -machinefile m ./relay 1
compute-0-1.local compute-0-0.local
size= 1, 131072 hops, 2 nodes in 0.313 sec ( 2.386 us/hop) 1637 KB/sec
2.3us is definitely faster than GigE. I don't have IPoverIB setup, ifconfig
-a shows ib0, but it has no IP address.
I removed all other openib implementations (infinipath came with one) before I
compiled, and the binary seems to be linked against the right libraries:
# ldd ./relay
libmpi.so.0 => /opt/pkg/openmpi-1.2.6/lib/libmpi.so.0
(0x00002aaaaacc7000)
libopen-rte.so.0 => /opt/pkg/openmpi-1.2.6/lib/libopen-rte.so.0
(0x00002aaaaafb5000)
libopen-pal.so.0 => /opt/pkg/openmpi-1.2.6/lib/libopen-pal.so.0
(0x00002aaaab23d000)
libdl.so.2 => /lib64/libdl.so.2 (0x00002aaaab4b2000)
libnsl.so.1 => /lib64/libnsl.so.1 (0x00002aaaab6b6000)
libutil.so.1 => /lib64/libutil.so.1 (0x00002aaaab8ce000)
libm.so.6 => /lib64/libm.so.6 (0x00002aaaabad2000)
libpthread.so.0 => /lib64/libpthread.so.0 (0x00002aaaabd55000)
libc.so.6 => /lib64/libc.so.6 (0x00002aaaabf6f000)
/lib64/ld-linux-x86-64.so.2 (0x00002aaaaaaab000)
Can anyone suggest what to look into?