All,

We are preparing for a switch from our current job scheduler to slurm
and I am running into a strange issue. I compiled openmpi with slurm
support and when I start a job with sbatch and use mpirun everything
works fine. However, when I use srun instead of mpirun and the job does
not fit on a single node, I either receive the following openmpi warning
a number of times:
--------------------------------------------------------------------------
WARNING: Missing locality information required for sm initialization.
Continuing without shared memory support.
--------------------------------------------------------------------------
or a segmentation fault in an openmpi library (address not mapped) or
both.

I only observe this with mpi-programs compiled with openmpi and ran by
srun when the job does not fit on a single node. The same program
started by openmpi's mpirun runs fine. The same source compiled with
mvapich2 works fine with srun.

Some version info:
slurm 14.11.7
openmpi 1.8.5
hwloc 1.10.1 (used for both slurm and openmpi)
os: RHEL 7.1

Has anyone seen that warning before and what would be a good place to
start troubleshooting?


Thank you,
Paul

Reply via email to