Chris, Mehdi, thank you. I must say that based on what I'd read salloc appeared to me as a command to start interactive jobs while srun is to "Run parallel jobs". If I get it right now, srun must be used to start interactive sessions srun --ntasks=1 --mem-per-cpu=1000 --pty /bin/bash
(and salloc should be probably removed from the list of tools available to our users). Now, if I set in slurm.conf DefMemPerCPU=800 MaxMemPerCPU=1600 and run srun --ntasks=1 --pty /bin/bash I get memory.limit_in_bytes 838860800 I can still override the max mem limit on the command line but at the cost of having more cores srun --ntasks=1 --mem-per-cpu=2000 --pty /bin/bash memory.limit_in_bytes 2097152000 cpuset.cpus 0-2 It's only when I hit the limit of number of cores x 1600 MB I get an error. srun --ntasks=1 --mem-per-cpu=20000 --pty /bin/bash srun: Force Terminated job 52 srun: error: CPU count per node can not be satisfied srun: error: Unable to allocate resources: Requested node configuration is not available So far so good. -----Original Message----- From: Mehdi Denou [mailto:[email protected]] Sent: 07 May 2015 13:07 To: slurm-dev Subject: [slurm-dev] Re: cgroups support in slurm (sbatch vs salloc) Here is another example which is (from my point of view) less confusing: [root@host1 ~]# salloc -N 1 salloc: Granted job allocation 8 [root@host1 ~]# srun hostname host9 [root@host1 ~]# hostname host1 [root@host1 ~]# exit exit salloc: Relinquishing job allocation 8 salloc: Job allocation 8 has been revoked. [root@host1 ~]# Le 07/05/2015 13:28, Chris Samuel a écrit : > On Thu, 7 May 2015 04:01:25 AM Igor Kozin wrote: > >> My real question is why running >> salloc --mem-per-cpu=1000 --ntasks=1 bash >> does not create cgroups and therefore gets you an unlimited interactive >> session? > My understanding is that salloc will give you a session on the same node you > run it, and you then need to use srun to launch a process on the assigned > compute node (and thus into the relevant control group). > > To demonstrate, here is an example from one of our systems (Slurm 14.03.11), > first just running hostname in salloc so you can see the shell is on the same > node: > > [samuel@merri ~]$ salloc hostname > salloc: Pending job allocation 2096414 > salloc: job 2096414 queued and waiting for resources > salloc: job 2096414 has been allocated resources > salloc: Granted job allocation 2096414 > merri > salloc: Relinquishing job allocation 2096414 > [samuel@merri ~]$ > > > Now running hostname with srun inside salloc to show it appears on the > compute > node instead: > > [samuel@merri ~]$ salloc srun hostname > salloc: Pending job allocation 2096415 > salloc: job 2096415 queued and waiting for resources > salloc: job 2096415 has been allocated resources > salloc: Granted job allocation 2096415 > Scratch directory /scratch/merri/jobs/2096415 has been allocated > merri009 > salloc: Relinquishing job allocation 2096415 > > > Now to demonstrate that the one on the login node has (as expected) no cgroup > whilst the one run with srun does run inside a cgroup: > > [samuel@merri ~]$ salloc cat /proc/self/cpuset > salloc: Pending job allocation 2096416 > salloc: job 2096416 queued and waiting for resources > salloc: job 2096416 has been allocated resources > salloc: Granted job allocation 2096416 > / > salloc: Relinquishing job allocation 2096416 > salloc: Job allocation 2096416 has been revoked. > [samuel@merri ~]$ > > [samuel@merri ~]$ salloc srun cat /proc/self/cpuset > salloc: Pending job allocation 2096417 > salloc: job 2096417 queued and waiting for resources > salloc: job 2096417 has been allocated resources > salloc: Granted job allocation 2096417 > Scratch directory /scratch/merri/jobs/2096417 has been allocated > /slurm/uid_500/job_2096417/step_0 > salloc: Relinquishing job allocation 2096417 > salloc: Job allocation 2096417 has been revoked. > [samuel@merri ~]$ > > > Hope that helps! > > All the best, > Chris -- --- Mehdi Denou International HPC support +336 45 57 66 56 The Institute of Cancer Research: Royal Cancer Hospital, a charitable Company Limited by Guarantee, Registered in England under Company No. 534147 with its Registered Office at 123 Old Brompton Road, London SW7 3RP. This e-mail message is confidential and for use by the addressee only. If the message is received by anyone other than the addressee, please return the message to the sender by replying to it and then delete the message from your computer and network.
