body p { margin-bottom: 0cm; margin-top: 0pt; } 
 Hi,
 I came across the exact same trio of messages today:
     exe: MPI_Init: mpid exited
     exe: MPI_Init: Can't setup shared memory
     exe: MPI_Init: Cannot set srun startup protocol
     srun: error: <blah blah> Exited with exit
         code 1
     srun:
           Terminating job step <blah>
 This happened while trying to run ansys 14.5 autodyn ( which MUST
   use platform mpi 8 ) in the cluster.
 Platform mpi ( used to be HP MPI) has integrated support for
   slurm - adding "-srun" will cause it to use srun, which seems
   great, except that autodyn decided to rely on the -f appfile
   feature to setup some bizarre master process and slave processes.
 the translation of the mpirun ( even using export MPI_USESRUN=1 )
   to an srun is incorrect - only the master process is invoked with
   srun.
 I still haven't figured out how to resolve this.
 If anyone can help I'd greatly appreciate it.
 On 06/02//2013 07:05, Michael Colonno
   wrote:
                 Hi ~
      
                 I'm trying to run a
       commercial application (one I didn't compile). When I try to
       launch it on four cores, completely outside of SLURM, I get
       this: 
      
     [mike@node9 test]$ /path/to/application -dis -np
         4 -b < test.input
     exe: MPI_Init: mpid exited
     exe: MPI_Init: Can't setup shared memory
     exe: MPI_Init: Cannot set srun startup protocol
     srun: error: node1: task 0: Exited with exit code
         1
     srun: Terminating job step 249.0
     srun: Job step aborted: Waiting up to 2 seconds
         for job step to finish.
     slurmd[node1]: *** STEP 249.0 KILLED AT
         2013-02-05T20:10:22 WITH SIGNAL 9 ***
     slurmd[node1]: *** STEP 249.0 KILLED AT
         2013-02-05T20:10:22 WITH SIGNAL 9 ***
     srun: error: node1: tasks 1-3: Killed
      
                 This behavior has a few
       confusing aspects. I have a version of MPICH2 compiled and
       linked to SLURM elsewhere on the system but this is not in my
       PATH. This application is not linked to this MPI
       implementation (includes its own). It seems that not only is
       this application trying to run through SLURM, despite not
       being launched with srun or sbatch, it's execution is
       attempted on different system than it was launched from (and
       one that I did not specify). I’m not certain exactly what to
       ask here. I suppose the first question is how can I run this
       application without any cross-talk between it and SLURM?
       Ultimately I do want to run this through SLURM but since I
       didn’t compile it it’s probably best to let it use its own MPI
       implementation (treat it like a black box). This behavior does
       not occur running the application in single-threaded mode. 
      
                 Thanks,
                 ~Mike C.  
      

Reply via email to