Hi, I have been trying for the past few days to get an MPI application (the pallas bm) to run with ompi and openib.
My environment: =============== . two quad cpu hosts with one mlx hca each. . the hosts are running suse10 (kernel 2.6.13) with the latest (close to it) from openib. (rev 4904, specifically) . opensm runs on third machine with the same os. . openmpi is built from openmpi-1.1a1r8727.tar.bz2 Behaviour: ========== . openib seems to behave ok (ipoib works, rdma_bw and rdma_lat work, osm works) . I can mpirun any non-mpi program like ls, hostname, or ompi_info all right. . I can mpirun the pallas bm on any single host (the local one or the other) . I can mpirun the pallas bm on the two nodes provided that I disable the openib btl . If I try to use the openib btl, the bm does not start (at best I get the initial banner, sometimes not). On both hosts, I see that the PMB processes (the correct number for each host) use 99% cpu. I obtained the exact same behaviour with the following src packages: openmpi-1.0.1.tar.bz2 openmpi-1.0.2a3r8706.tar.bz2 openmpi-1.1a1r8727.tar.bz2 Earlier on, I also did the same experiment with openmpi-1.0.1 and the stock gen2 of the suse kernel; same thing. Configuration: ============== For building, I tried the following variants: ./configure --prefix=/opt/ompi --enable-mpi-threads --enable-progress-thread ./configure --prefix=/opt/ompi ./configure --prefix=/opt/ompi --disable-smp-locks I also tried many variations to mca-params.conf. What I normally use for trying openib is: rmaps_base_schedule_policy = node btl = ^tcp mpi_paffinity_alone = 1 The mpirun cmd I normally use is: mpirun -prefix /opt/ompi -wdir `pwd` -machinefile /root/machines -np 2 PMB-MPI1 My machine file being: bench1 slots=4 max-slots=4 bench2 slots=4 max-slots=4 Am I doing something obviously wrong ? Thanks for any help ! -- Jean-Christophe Hugly <j...@pantasys.com> PANTA