Nope – my slurm.conf is very basic (been using it for several
versions).
# COMPUTE NODES
NodeName=node[1-8] Sockets=2 CoresPerSocket=6 ThreadsPerCore=1
State=IDLE
PartitionName=all Nodes=node[1-8] Default=YES MaxTime=INFINITE
State=UP
Perhaps a system-level limit or something not set in the slurm
init.d script? This all looks pretty normal:
# ulimit -a
core file size (blocks, -c) 0
data seg size (kbytes, -d) unlimited
scheduling priority (-e) 0
file size (blocks, -f) unlimited
pending signals (-i) 256422
max locked memory (kbytes, -l) 64
max memory size (kbytes, -m) unlimited
open files (-n) 1024
pipe size (512 bytes, -p) 8
POSIX message queues (bytes, -q) 819200
real-time priority (-r) 0
stack size (kbytes, -s) 10240
cpu time (seconds, -t) unlimited
max user processes (-u) 256422
virtual memory (kbytes, -v) unlimited
file locks (-x) unlimited
Thanks,
~Mike C.
From: Morris Jette [mailto:[email protected]]
Sent: Thursday, April 09, 2015 10:59 AM
To: slurm-dev
Subject: [slurm-dev] Re: default memory limit (14.11.5)?
Do you have a DefMemPerCPU or DefMemPerNode configured in slurm.conf?
On April 9, 2015 10:52:37 AM PDT, Michael Colonno <[email protected]> wrote:
Hi ~
I just upgraded my cluster to SLURM 14.11.5. Everything went
smoothly but when I run a test case it seems there is now a (very small) memory
limit on jobs:
$ srun -n4 date
slurmstepd: Step 19293.0 exceeded memory limit (3324 > 1024), being killed
srun: Exceeded job memory limit
slurmstepd: *** STEP 19293.0 CANCELLED AT 2015-04-09T10:46:17 *** on node6
srun: Job step aborted: Waiting up to 2 seconds for job step to finish.
srun: error: node6: tasks 0-3: Killed
How can I disable / fix this?
Thanks,
~Mike C.
--
Sent from my Android device with K-9 Mail. Please excuse my brevity.