On Wednesday 19 April 2017 17:51:03 Mike Cammilleri wrote:
> Hi Slurm community,
> I have hopefully an easy question regarding cpu/partition configuration in
> We are running slurm 16.05.6 built on Ubuntu 14.04 LTS (because 14.04 works
> with our current bcfg2 xml configuration management servers).
> Each node has two, 12 core Intel(R) Xeon(R) CPU E5-2680 v3 @ 2.50GHz
> When you run 'cat /proc/cpuinfo' it returns 48 processors because each cores
> consists of two threads.
> I want to make sure that we are defining our cpu and available cores to slurm
> appropriately. What slurm considers a cpu, and what a process considers a
> thread - all can get mixed up with the semantics.
> Most users run R. R is single threaded so when someone submits a job it will
> take 1 thread and leave the other thread on the core empty. So although a
> user thinks there are 48 cores available, in actuality they only have the 24
> physical available to them. If however they are running an app that can use
> the multiple threads (Julia?) then things are different. We've been getting
> by up to this point until a user tried to run a numpy array in his python3.5
> app which has resulted in all kinds of cpu overload and memory swap. He's
> using job arrays of size 32, running one array in each job, and on one node
> for example 12 of his python apps are running but all 48 cpus are utilized.
> Load average is 300.0+. Sometimes memory is swapping and sometimes not.
using cgroups will help to ensure that jobs cannot use more resources than
asked for. see:
$ cat /etc/slurm-llnl/slurm.conf | grep -i cgroup
$ cat /etc/slurm-llnl/cgroup.conf
Graz University of Technology
Signal Processing and Speech Communication Laboratory