It looks like the job timed out - I’m guessing that there is some kind of 
timeout spec being applied to the batch script that is not being applied to the 
interactive execution

> On Jul 20, 2016, at 12:19 PM, Kwok, Patrick <patrick.k...@sunnybrook.ca> 
> wrote:
> 
> Hi SLURM gurus,
>  
> I’m new to using slurm, so please excuse my lack of knowledge.
>  
> We’re trying to schedule an mpi program called Pegasos, according to Elekta, 
> who installed/configured Pegasos, it works with mpirun.  I created a shell 
> script using mpirun, and I am trying to run it on 2 node, using 20CPUs each.
>  
> 1)      more test.mpirun.sh
> #!/bin/bash
> #SBATCH --ntasks-per-node=20
> #SBATCH -N 2
> mpirun -np 40 PegasosMPI test.sim
>  
> Running the bash script directly will finish normally.  Next, I try to submit 
> the job/script with sbatch
>  
> 2)      sbatch test.mpirun.sh 
>  
> It uses resources on the 2 nodes as expected, but Pegasos did not seem to run 
> as no output files were generated.  Here is the output for slurm-xxxx.out:
>  
> “srun: cluster configuration lacks support for cpu binding
> pegasos$Receive timeout, aborting simulation [PegasosMPI.cc 
> <http://pegasosmpi.cc/>,274]
>  
> --------------------------------------------------------------------------
> MPI_ABORT was invoked on rank 0 in communicator MPI_COMM_WORLD
> with errorcode -1.
>  
> NOTE: invoking MPI_ABORT causes Open MPI to kill all MPI processes.
> You may or may not see output from other processes, depending on
> exactly when Open MPI kills them.
> --------------------------------------------------------------------------“
>  
> Can anyone help?
>  
> Thanks!
> Patrick
> This e-mail is intended only for the named recipient(s) and may contain 
> confidential, personal and/or health information (information which may be 
> subject to legal restrictions on use, retention and/or disclosure).  No 
> waiver of confidence is intended by virtue of communication via the internet. 
>  Any review or distribution by anyone other than the person(s) for whom it 
> was originally intended is strictly prohibited.  If you have received this 
> e-mail in error, please contact the sender and destroy all copies.

Reply via email to