Hi Ralph,

Thanks for replying!

Is that timeout coming from sbatch, being passed into the Pegasos execution 
shell?  Anyway to stop sbatch from sending the timeout?

Thanks,
Patrick

From: Ralph Castain [mailto:r...@open-mpi.org]
Sent: July 20, 2016 3:28 PM
To: slurm-dev
Subject: [slurm-dev] Re: MPI_ABORT was invoked on rank 0 in communicator 
MPI_COMM_WORLD with errorcode -1

It looks like the job timed out - I’m guessing that there is some kind of 
timeout spec being applied to the batch script that is not being applied to the 
interactive execution

On Jul 20, 2016, at 12:19 PM, Kwok, Patrick 
<patrick.k...@sunnybrook.ca<mailto:patrick.k...@sunnybrook.ca>> wrote:

Hi SLURM gurus,

I’m new to using slurm, so please excuse my lack of knowledge.

We’re trying to schedule an mpi program called Pegasos, according to Elekta, 
who installed/configured Pegasos, it works with mpirun.  I created a shell 
script using mpirun, and I am trying to run it on 2 node, using 20CPUs each.

1)      more test.mpirun.sh
#!/bin/bash
#SBATCH --ntasks-per-node=20
#SBATCH -N 2
mpirun -np 40 PegasosMPI test.sim

Running the bash script directly will finish normally.  Next, I try to submit 
the job/script with sbatch

2)      sbatch test.mpirun.sh

It uses resources on the 2 nodes as expected, but Pegasos did not seem to run 
as no output files were generated.  Here is the output for slurm-xxxx.out:

“srun: cluster configuration lacks support for cpu binding
pegasos$Receive timeout, aborting simulation 
[PegasosMPI.cc<http://pegasosmpi.cc/>,274]

--------------------------------------------------------------------------
MPI_ABORT was invoked on rank 0 in communicator MPI_COMM_WORLD
with errorcode -1.

NOTE: invoking MPI_ABORT causes Open MPI to kill all MPI processes.
You may or may not see output from other processes, depending on
exactly when Open MPI kills them.
--------------------------------------------------------------------------“

Can anyone help?

Thanks!
Patrick
This e-mail is intended only for the named recipient(s) and may contain 
confidential, personal and/or health information (information which may be 
subject to legal restrictions on use, retention and/or disclosure).  No waiver 
of confidence is intended by virtue of communication via the internet.  Any 
review or distribution by anyone other than the person(s) for whom it was 
originally intended is strictly prohibited.  If you have received this e-mail 
in error, please contact the sender and destroy all copies.

This e-mail is intended only for the named recipient(s) and may contain 
confidential, personal and/or health information (information which may be 
subject to legal restrictions on use, retention and/or disclosure).  No waiver 
of confidence is intended by virtue of communication via the internet.  Any 
review or distribution by anyone other than the person(s) for whom it was 
originally intended is strictly prohibited.  If you have received this e-mail 
in error, please contact the sender and destroy all copies.

Reply via email to