All, We are preparing for a switch from our current job scheduler to slurm and I am running into a strange issue. I compiled openmpi with slurm support and when I start a job with sbatch and use mpirun everything works fine. However, when I use srun instead of mpirun and the job does not fit on a single node, I either receive the following openmpi warning a number of times: -------------------------------------------------------------------------- WARNING: Missing locality information required for sm initialization. Continuing without shared memory support. -------------------------------------------------------------------------- or a segmentation fault in an openmpi library (address not mapped) or both.
I only observe this with mpi-programs compiled with openmpi and ran by srun when the job does not fit on a single node. The same program started by openmpi's mpirun runs fine. The same source compiled with mvapich2 works fine with srun. Some version info: slurm 14.11.7 openmpi 1.8.5 hwloc 1.10.1 (used for both slurm and openmpi) os: RHEL 7.1 Has anyone seen that warning before and what would be a good place to start troubleshooting? Thank you, Paul
