[slurm-users] ntasks and gres question

2022-04-06 Thread Chip Seraphine
Hello, In my cluster, every node has one instance of a gres called ‘io_nic’. The intent of it is to make it easier for users to ensure that jobs that perform excessive network I/O do not get scheduled simultaneously on the same machine. $ sinfo -N -o '%N %Gres' NODELIST GRESres

[slurm-users] Strange memory limit behavior with --mem-per-gpu

2022-04-06 Thread Paul Raines
I have a user who submitted an interactive srun job using: srun --mem-per-gpu 64 --gpus 1 --nodes 1 From sacct for this job we see: ReqTRES : billing=4,cpu=1,gres/gpu=1,mem=10G,node=1 AllocTRES : billing=4,cpu=1,gres/gpu=1,mem=64M,node=1 (where 10G I assume comes from

Re: [slurm-users] Memory usage not tracked

2022-04-06 Thread Chin,David
Hi, Xand: How does adding "ReqMem" to the sacct change the output? E.g. on my cluster running Slurm 20.02.7 (on RHEL8), our GPU nodes have TRESBillingWeights=CPU=0,Mem=0,GRES/gpu=43: $ sacct --format=JobID%25,State,AllocTRES%50,ReqTRES,ReqMem,ReqCPUS|grep RUNNING JobID

Re: [slurm-users] Configuring SLURM on single node GPU cluster

2022-04-06 Thread Stephen Cousins
Hi Sushil, Try changing NodeName specification to: NodeName=localhost CPUs=96 State=UNKNOWN Gres=gpu*:8* Also: TaskPlugin=task/cgroup Best, Steve On Wed, Apr 6, 2022 at 9:56 AM Sushil Mishra wrote: > Dear SLURM users, > > I am very new to alarm and need some help in configuring slurm in

Re: [slurm-users] Configuring SLURM on single node GPU cluster

2022-04-06 Thread Kamil Wilczek
Hello, try to comment out the line: AutoDetect=nvml And then restart "slurmd" and "slurmctld". Job allocations to the same GPU might be an effect of automatic MPS configuration, thogugh I'm not sure for 100%: https://slurm.schedmd.com/gres.html#MPS_Management Kind Regards -- Kamil

Re: [slurm-users] [EXTERNAL] Re: Managing shared memory (/dev/shm) usage per job?

2022-04-06 Thread John Hanks
Thanks, Greg! This looks like the right way to do this. I will have to stop putting off learning to use spank plugins :) griznog On Wed, Apr 6, 2022 at 1:40 AM Greg Wickham wrote: > Hi John, Mark, > > > > We use a spank plugin > https://gitlab.com/greg.wickham/slurm-spank-private-tmpdir (this

Re: [slurm-users] [EXTERNAL] Re: Managing shared memory (/dev/shm) usage per job?

2022-04-06 Thread Greg Wickham
Hi John, Mark, We use a spank plugin https://gitlab.com/greg.wickham/slurm-spank-private-tmpdir (this was derived from other authors but modified for functionality required on site). It can bind tmpfs mount points to the users cgroup allocation, additionally bind options can be provided (ie: