Thanks for the tip!
We actually already have a setup where srun
--ntasks=$SLURM_JOB_NUM_NODES /bin/true is run at the start of every
job, so we're definitely going to look into this.
--
Regards,
Bjørn-Helge Mevik, dr. scient,
Department for Research Computing, University of Oslo
Hello,
we are having an issue with SLURM killing jobs because of virtual
memory limits::
slurmstepd[46530]: error: Job 784 exceeded virtual memory limit
(416329820 211812352), being killed
The problem is that the job above has actually negligible heap use,
*but* it allocates a SysV shared
On 20/09/14 02:02, Riccardo Murri wrote:
The problem is that the job above has actually negligible heap use,
*but* it allocates a SysV shared memory segment of about 100GB. It
seems that the size of this shared memory segment is counted towards
*all* 4 processes in the job, instead of being
Riccardo,
Your configuration is very close to ours, and this is an issue we're facing
too. The VSizeFactor=101 is also what we use, and is something we may set back
to 0 because of issues with how SLURM treats memory. We are on 14.03.6 and
so far it seems that SLURM does not handle memory
I've been documenting for my users how to move from Torque to SLURM and what
that means for running MPI jobs. Based on the SLURM documentation I've come up
with the following:
$ slurm.conf
MpiDefault=none
MpiParams=ports=3-3
Then users run...
OpenMPI:
srun --mpi=openmpi
AFAIK you don't need resv-ports with OpenMPI PMI2 (it works for us anyway),
and you can also set the SLURM_MPI_TYPE environment variable in your MPI
environment modules so users can run srun /path/to/executable whether
it's OpenMPI or MVAPICH2.
On Fri, Sep 19, 2014 at 6:44 PM, Trey Dockendorf
Thanks for confirming. The idea of setting the environment variable is a good
one, thanks!
- Trey
=
Trey Dockendorf
Systems Analyst I
Texas AM University
Academy for Advanced Telecommunications and Learning Technologies
Phone: (979)458-2396
Email: