body p { margin-bottom: 0cm; margin-top: 0pt; }
Hi,
I came across the exact same trio of messages today:
exe: MPI_Init: mpid exited
exe: MPI_Init: Can't setup shared memory
exe: MPI_Init: Cannot set srun startup protocol
srun: error: <blah blah> Exited with exit
code 1
srun:
Terminating job step <blah>
This happened while trying to run ansys 14.5 autodyn ( which MUST
use platform mpi 8 ) in the cluster.
Platform mpi ( used to be HP MPI) has integrated support for
slurm - adding "-srun" will cause it to use srun, which seems
great, except that autodyn decided to rely on the -f appfile
feature to setup some bizarre master process and slave processes.
the translation of the mpirun ( even using export MPI_USESRUN=1 )
to an srun is incorrect - only the master process is invoked with
srun.
I still haven't figured out how to resolve this.
If anyone can help I'd greatly appreciate it.
On 06/02//2013 07:05, Michael Colonno
wrote:
Hi ~
I'm trying to run a
commercial application (one I didn't compile). When I try to
launch it on four cores, completely outside of SLURM, I get
this:
[mike@node9 test]$ /path/to/application -dis -np
4 -b < test.input
exe: MPI_Init: mpid exited
exe: MPI_Init: Can't setup shared memory
exe: MPI_Init: Cannot set srun startup protocol
srun: error: node1: task 0: Exited with exit code
1
srun: Terminating job step 249.0
srun: Job step aborted: Waiting up to 2 seconds
for job step to finish.
slurmd[node1]: *** STEP 249.0 KILLED AT
2013-02-05T20:10:22 WITH SIGNAL 9 ***
slurmd[node1]: *** STEP 249.0 KILLED AT
2013-02-05T20:10:22 WITH SIGNAL 9 ***
srun: error: node1: tasks 1-3: Killed
This behavior has a few
confusing aspects. I have a version of MPICH2 compiled and
linked to SLURM elsewhere on the system but this is not in my
PATH. This application is not linked to this MPI
implementation (includes its own). It seems that not only is
this application trying to run through SLURM, despite not
being launched with srun or sbatch, it's execution is
attempted on different system than it was launched from (and
one that I did not specify). I’m not certain exactly what to
ask here. I suppose the first question is how can I run this
application without any cross-talk between it and SLURM?
Ultimately I do want to run this through SLURM but since I
didn’t compile it it’s probably best to let it use its own MPI
implementation (treat it like a black box). This behavior does
not occur running the application in single-threaded mode.
Thanks,
~Mike C.