It looks like the job timed out - I’m guessing that there is some kind of timeout spec being applied to the batch script that is not being applied to the interactive execution
> On Jul 20, 2016, at 12:19 PM, Kwok, Patrick <patrick.k...@sunnybrook.ca> > wrote: > > Hi SLURM gurus, > > I’m new to using slurm, so please excuse my lack of knowledge. > > We’re trying to schedule an mpi program called Pegasos, according to Elekta, > who installed/configured Pegasos, it works with mpirun. I created a shell > script using mpirun, and I am trying to run it on 2 node, using 20CPUs each. > > 1) more test.mpirun.sh > #!/bin/bash > #SBATCH --ntasks-per-node=20 > #SBATCH -N 2 > mpirun -np 40 PegasosMPI test.sim > > Running the bash script directly will finish normally. Next, I try to submit > the job/script with sbatch > > 2) sbatch test.mpirun.sh > > It uses resources on the 2 nodes as expected, but Pegasos did not seem to run > as no output files were generated. Here is the output for slurm-xxxx.out: > > “srun: cluster configuration lacks support for cpu binding > pegasos$Receive timeout, aborting simulation [PegasosMPI.cc > <http://pegasosmpi.cc/>,274] > > -------------------------------------------------------------------------- > MPI_ABORT was invoked on rank 0 in communicator MPI_COMM_WORLD > with errorcode -1. > > NOTE: invoking MPI_ABORT causes Open MPI to kill all MPI processes. > You may or may not see output from other processes, depending on > exactly when Open MPI kills them. > --------------------------------------------------------------------------“ > > Can anyone help? > > Thanks! > Patrick > This e-mail is intended only for the named recipient(s) and may contain > confidential, personal and/or health information (information which may be > subject to legal restrictions on use, retention and/or disclosure). No > waiver of confidence is intended by virtue of communication via the internet. > Any review or distribution by anyone other than the person(s) for whom it > was originally intended is strictly prohibited. If you have received this > e-mail in error, please contact the sender and destroy all copies.