Hi,
I'm not sure whether this problem is with SLURM or OpenMPI, but the stack
traces (below) point to an issue within OpenMPI.
Whenever I try to launch an MPI job within SLURM, mpirun immediately
segmentation faults -- but only if the machine that SLURM allocated to MPI is
different to the one that I launched the MPI job.
However, if I force SLURM to allocate only the local node (ie, the one on which
salloc was called), everything works fine.
Failing case:
michael@ipc ~ $ salloc -n8 mpirun --display-map ./mpi
======================== JOB MAP ========================
Data for node: Name: ipc4 Num procs: 8
Process OMPI jobid: [21326,1] Process rank: 0
Process OMPI jobid: [21326,1] Process rank: 1
Process OMPI jobid: [21326,1] Process rank: 2
Process OMPI jobid: [21326,1] Process rank: 3
Process OMPI jobid: [21326,1] Process rank: 4
Process OMPI jobid: [21326,1] Process rank: 5
Process OMPI jobid: [21326,1] Process rank: 6
Process OMPI jobid: [21326,1] Process rank: 7
=============================================================
[ipc:16986] *** Process received signal ***
[ipc:16986] Signal: Segmentation fault (11)
[ipc:16986] Signal code: Address not mapped (1)
[ipc:16986] Failing at address: 0x801328268
[ipc:16986] [ 0] /lib/libpthread.so.0(+0xf8f0) [0x7ff85c7638f0]
[ipc:16986] [ 1] /usr/lib/libopen-rte.so.0(+0x3459a) [0x7ff85d4a059a]
[ipc:16986] [ 2] /usr/lib/libopen-pal.so.0(+0x1eeb8) [0x7ff85d233eb8]
[ipc:16986] [ 3] /usr/lib/libopen-pal.so.0(opal_progress+0x99) [0x7ff85d228439]
[ipc:16986] [ 4] /usr/lib/libopen-rte.so.0(orte_plm_base_daemon_callback+0x9d)
[0x7ff85d4a002d]
[ipc:16986] [ 5] /usr/lib/openmpi/lib/openmpi/mca_plm_slurm.so(+0x211a)
[0x7ff85bbc311a]
[ipc:16986] [ 6] mpirun() [0x403c1f]
[ipc:16986] [ 7] mpirun() [0x403014]
[ipc:16986] [ 8] /lib/libc.so.6(__libc_start_main+0xfd) [0x7ff85c3efc4d]
[ipc:16986] [ 9] mpirun() [0x402f39]
[ipc:16986] *** End of error message ***
Non-failing case:
michael@eng-ipc4 ~ $ salloc -n8 -w ipc4 mpirun --display-map ./mpi
======================== JOB MAP ========================
Data for node: Name: eng-ipc4.FQDN Num procs: 8
Process OMPI jobid: [12467,1] Process rank: 0
Process OMPI jobid: [12467,1] Process rank: 1
Process OMPI jobid: [12467,1] Process rank: 2
Process OMPI jobid: [12467,1] Process rank: 3
Process OMPI jobid: [12467,1] Process rank: 4
Process OMPI jobid: [12467,1] Process rank: 5
Process OMPI jobid: [12467,1] Process rank: 6
Process OMPI jobid: [12467,1] Process rank: 7
=============================================================
Process 1 on eng-ipc4.FQDN out of 8
Process 3 on eng-ipc4.FQDN out of 8
Process 4 on eng-ipc4.FQDN out of 8
Process 6 on eng-ipc4.FQDN out of 8
Process 7 on eng-ipc4.FQDN out of 8
Process 0 on eng-ipc4.FQDN out of 8
Process 2 on eng-ipc4.FQDN out of 8
Process 5 on eng-ipc4.FQDN out of 8
Using mpi directly is fine:
eg mpirun -H 'ipc3,ipc4' -np 8 ./mpi
Works as expected
This is a (small) homogenous cluster, all Xeon class machines with plenty of
RAM and shared filesystem over NFS, running 64-bit Ubuntu server. I was
running stock OpenMPI (1.4.1) and SLURM (2.1.1), I have since upgraded to
latest stable OpenMPI (1.4.3) and SLURM (2.2.0), with no effect. (the newer
binaries were compiled from the respective upstream Debian packages).
strace (not shown) shows that the job is launched via srun and a connection is
received back from the child process over TCP/IP. Soon after this, mpirun
crashes. Nodes communicate over a semi-dedicated TCP/IP GigE connection.
Is this a known bug? What is going wrong?
Regards,
Michael Curtis