Hi! I finally installed OpenMPI 1.0.2-a7 with libibverbs-1.0-rc5 and libmthca-1.0-rc5 on Debian sarge with kernel 2.6.15 (from www.backports.org) in order to use InfiniBand.
While InfiniBand seems to be working (ping with IPoIB works perfectly), the mpirun/orterun command causes trouble using rsh as well as ssh. The /usr/local/etc/openmpi-default-hostfile contains node01 slots=2 node02 slots=2 Both hosts are completely identical (apart from network config) and the problem is symmetric. Although I can execute commands (all on node01) like $ mpirun -np 1 hostname node01 and $ rsh node02 hostname node02 the command $ mpirun -np 4 hostname node01 node01 hangs. After pressing Ctrl+C it stops, but gives no hint about the cause of the problem. An output of $ mpirun --debug -np 4 hostname can be found in the attachment. The important line seems to be [node02:12018] [0,0,2]-[0,0,0] mca_oob_tcp_peer_complete_connect: connect() failed with errno=113 Unfortunately, I don't know what errno=113 means, but obviously it's a TCP problem. It doesn't seem to matter if orted runs or not. No processes are launched on the remote host. Thanks, Emanuel
config.log.gz
Description: GNU Zip compressed data
mpirun_debug.out.gz
Description: GNU Zip compressed data
ompi_info.out.gz
Description: GNU Zip compressed data