Hello,
In my cluster, every node has one instance of a gres called ‘io_nic’. The
intent of it is to make it easier for users to ensure that jobs that perform
excessive network I/O do not get scheduled simultaneously on the same machine.
$ sinfo -N -o '%N %Gres'
NODELIST GRESres
I have a user who submitted an interactive srun job using:
srun --mem-per-gpu 64 --gpus 1 --nodes 1
From sacct for this job we see:
ReqTRES : billing=4,cpu=1,gres/gpu=1,mem=10G,node=1
AllocTRES : billing=4,cpu=1,gres/gpu=1,mem=64M,node=1
(where 10G I assume comes from
Hi, Xand:
How does adding "ReqMem" to the sacct change the output?
E.g. on my cluster running Slurm 20.02.7 (on RHEL8), our GPU nodes have
TRESBillingWeights=CPU=0,Mem=0,GRES/gpu=43:
$ sacct --format=JobID%25,State,AllocTRES%50,ReqTRES,ReqMem,ReqCPUS|grep RUNNING
JobID
Hi Sushil,
Try changing NodeName specification to:
NodeName=localhost CPUs=96 State=UNKNOWN Gres=gpu*:8*
Also:
TaskPlugin=task/cgroup
Best,
Steve
On Wed, Apr 6, 2022 at 9:56 AM Sushil Mishra
wrote:
> Dear SLURM users,
>
> I am very new to alarm and need some help in configuring slurm in
Hello,
try to comment out the line:
AutoDetect=nvml
And then restart "slurmd" and "slurmctld".
Job allocations to the same GPU might be an effect of automatic MPS
configuration, thogugh I'm not sure for 100%:
https://slurm.schedmd.com/gres.html#MPS_Management
Kind Regards
--
Kamil
Thanks, Greg! This looks like the right way to do this. I will have to stop
putting off learning to use spank plugins :)
griznog
On Wed, Apr 6, 2022 at 1:40 AM Greg Wickham
wrote:
> Hi John, Mark,
>
>
>
> We use a spank plugin
> https://gitlab.com/greg.wickham/slurm-spank-private-tmpdir (this
Hi John, Mark,
We use a spank plugin
https://gitlab.com/greg.wickham/slurm-spank-private-tmpdir (this was derived
from other authors but modified for functionality required on site).
It can bind tmpfs mount points to the users cgroup allocation, additionally
bind options can be provided (ie: