On Thu, 7 May 2015 04:01:25 AM Igor Kozin wrote:

> My real question is why running
> salloc --mem-per-cpu=1000 --ntasks=1 bash
> does not create cgroups and therefore gets you an unlimited interactive
> session?

My understanding is that salloc will give you a session on the same node you 
run it, and you then need to use srun to launch a process on the assigned 
compute node (and thus into the relevant control group).

To demonstrate, here is an example from one of our systems (Slurm 14.03.11), 
first just running hostname in salloc so you can see the shell is on the same 
node:

[samuel@merri ~]$ salloc hostname
salloc: Pending job allocation 2096414
salloc: job 2096414 queued and waiting for resources
salloc: job 2096414 has been allocated resources
salloc: Granted job allocation 2096414
merri
salloc: Relinquishing job allocation 2096414
[samuel@merri ~]$ 


Now running hostname with srun inside salloc to show it appears on the compute 
node instead:

[samuel@merri ~]$ salloc srun hostname
salloc: Pending job allocation 2096415
salloc: job 2096415 queued and waiting for resources
salloc: job 2096415 has been allocated resources
salloc: Granted job allocation 2096415
Scratch directory /scratch/merri/jobs/2096415 has been allocated
merri009
salloc: Relinquishing job allocation 2096415


Now to demonstrate that the one on the login node has (as expected) no cgroup 
whilst the one run with srun does run inside a cgroup:

[samuel@merri ~]$ salloc cat /proc/self/cpuset
salloc: Pending job allocation 2096416
salloc: job 2096416 queued and waiting for resources
salloc: job 2096416 has been allocated resources
salloc: Granted job allocation 2096416
/
salloc: Relinquishing job allocation 2096416
salloc: Job allocation 2096416 has been revoked.
[samuel@merri ~]$ 

[samuel@merri ~]$ salloc srun cat /proc/self/cpuset
salloc: Pending job allocation 2096417
salloc: job 2096417 queued and waiting for resources
salloc: job 2096417 has been allocated resources
salloc: Granted job allocation 2096417
Scratch directory /scratch/merri/jobs/2096417 has been allocated
/slurm/uid_500/job_2096417/step_0
salloc: Relinquishing job allocation 2096417
salloc: Job allocation 2096417 has been revoked.
[samuel@merri ~]$ 


Hope that helps!

All the best,
Chris
-- 
 Christopher Samuel        Senior Systems Administrator
 VLSCI - Victorian Life Sciences Computation Initiative
 Email: [email protected] Phone: +61 (0)3 903 55545
 http://www.vlsci.org.au/      http://twitter.com/vlsci

Reply via email to