Hi Ralph, Thanks for replying!
Is that timeout coming from sbatch, being passed into the Pegasos execution shell? Anyway to stop sbatch from sending the timeout? Thanks, Patrick From: Ralph Castain [mailto:r...@open-mpi.org] Sent: July 20, 2016 3:28 PM To: slurm-dev Subject: [slurm-dev] Re: MPI_ABORT was invoked on rank 0 in communicator MPI_COMM_WORLD with errorcode -1 It looks like the job timed out - I’m guessing that there is some kind of timeout spec being applied to the batch script that is not being applied to the interactive execution On Jul 20, 2016, at 12:19 PM, Kwok, Patrick <patrick.k...@sunnybrook.ca<mailto:patrick.k...@sunnybrook.ca>> wrote: Hi SLURM gurus, I’m new to using slurm, so please excuse my lack of knowledge. We’re trying to schedule an mpi program called Pegasos, according to Elekta, who installed/configured Pegasos, it works with mpirun. I created a shell script using mpirun, and I am trying to run it on 2 node, using 20CPUs each. 1) more test.mpirun.sh #!/bin/bash #SBATCH --ntasks-per-node=20 #SBATCH -N 2 mpirun -np 40 PegasosMPI test.sim Running the bash script directly will finish normally. Next, I try to submit the job/script with sbatch 2) sbatch test.mpirun.sh It uses resources on the 2 nodes as expected, but Pegasos did not seem to run as no output files were generated. Here is the output for slurm-xxxx.out: “srun: cluster configuration lacks support for cpu binding pegasos$Receive timeout, aborting simulation [PegasosMPI.cc<http://pegasosmpi.cc/>,274] -------------------------------------------------------------------------- MPI_ABORT was invoked on rank 0 in communicator MPI_COMM_WORLD with errorcode -1. NOTE: invoking MPI_ABORT causes Open MPI to kill all MPI processes. You may or may not see output from other processes, depending on exactly when Open MPI kills them. --------------------------------------------------------------------------“ Can anyone help? Thanks! Patrick This e-mail is intended only for the named recipient(s) and may contain confidential, personal and/or health information (information which may be subject to legal restrictions on use, retention and/or disclosure). No waiver of confidence is intended by virtue of communication via the internet. Any review or distribution by anyone other than the person(s) for whom it was originally intended is strictly prohibited. If you have received this e-mail in error, please contact the sender and destroy all copies. This e-mail is intended only for the named recipient(s) and may contain confidential, personal and/or health information (information which may be subject to legal restrictions on use, retention and/or disclosure). No waiver of confidence is intended by virtue of communication via the internet. Any review or distribution by anyone other than the person(s) for whom it was originally intended is strictly prohibited. If you have received this e-mail in error, please contact the sender and destroy all copies.