Re: [OMPI users] Openmpi 1.10.x, mpirun and Slurm 15.08 problem

2016-09-23 Thread Gilles Gouaillardet
Marcin, You can also try to exclude the public subnet(s) (e.g. 1.2.3.0/24) and the loopback interface instead of em4 that does not exist on the compute nodes. Or you can include only the private subnet(s) that are common to frontend and compute nodes Cheers, Gilles On Saturday, September 24,

Re: [OMPI users] Openmpi 1.10.x, mpirun and Slurm 15.08 problem

2016-09-23 Thread marcin.krotkiewski
Thanks for a quick answer, Ralph! This does not work, because em4 is only defined on the frontend node. Now I get errors from the computes: [compute-1-4.local:12206] found interface lo [compute-1-4.local:12206] found interface em1 [compute-1-4.local:12206] mca: base: components_open:

Re: [OMPI users] Openmpi 1.10.x, mpirun and Slurm 15.08 problem

2016-09-23 Thread r...@open-mpi.org
This isn’t an issue with the SLURM integration - this is the problem of our OOB not correctly picking the right subnet for connecting back to mpirun. In this specific case, you probably want -mca btl_tcp_if_include em4 -mca oob_tcp_if_include em4 since it is the em4 network that ties the

Re: [OMPI users] Openmpi 1.10.x, mpirun and Slurm 15.08 problem

2016-09-23 Thread Marcin Krotkiewski
Hi, I have stumbled upon a similar issue, so I wonder those might be related. On one of our systems I get the following error message, both when using openmpi 1.8.8 and 1.10.4 $ mpirun -debug-daemons --mca btl tcp,self --mca mca_base_verbose 100 --mca btl_base_verbose 100 ls [...]

Re: [OMPI users] Openmpi 1.10.x, mpirun and Slurm 15.08 problem

2016-04-01 Thread Jeff Squyres (jsquyres)
Ralph -- What's the state of PMI integration with SLURM in the v1.10.x series? (I haven't kept up with SLURM's recent releases to know if something broke between existing Open MPI releases and their new releases...?) > On Mar 31, 2016, at 4:24 AM, Tommi T wrote: > >

[OMPI users] Openmpi 1.10.x, mpirun and Slurm 15.08 problem

2016-03-31 Thread Tommi T
Hi, stack: el6.7, mlnx ofed 3.1 (IB FDR) and slurm 15.08.9 (whithout *.la libs). problem: OpenMPI 1.10.x built with pmi support does not work when trying to use sbatch/salloc - mpirun combination. srun ompi_mpi_app works fine. Older 1.8.x version works fine under same salloc session.