[slurm-users] _node_config_validate: gres/gpu: Count changed on node (0 != 2)

Xaver Stiensmeier via slurm-users Fri, 20 Mar 2026 09:22:34 -0700

Hey Slurm-users list,

while our regular gpu nodes are working fine, our on demand gpu nodeshave a weird issue. They power up, I can ssh onto them and executenvidia-smi on them without issue, but they are marked invalid andslurmctld logs


   _node_config_validate: gres/gpu: Count changed on node (0 != 2)

however, node show shows that the gpus are recognized and the gres.confare stored on the worker nodes as expected and the node entries in theslurm.conf are fine, too:


   # slurm.conf
   NodeName=my_worker_node SocketsPerBoard=16 CoresPerSocket=1
   RealMemory=64075 MemSpecLimit=4000 State=CLOUD Gres=gpu:L4:2 # openstack

   # gres.conf on my_worker_node
   ubuntu@my_node:~$ cat /etc/slurm/gres.conf
   # GRES CONFIG
   Name=gpu Type=L4 File=/dev/nvidia0
   Name=gpu Type=L4 File=/dev/nvidia1

Thankful for any ideas and debugging ideas.

Best,
Xaver

PS:
By executing:

   sudo scontrol update NodeName=$(bibiname 0) Gres=
   sudo scontrol reconfigure
   sudo scontrol update NodeName=$(bibiname 0) state=RESUME reason=None

the node can be resumed. However, this is not a real solution.

-- 
slurm-users mailing list -- [email protected]
To unsubscribe send an email to [email protected]

[slurm-users] _node_config_validate: gres/gpu: Count changed on node (0 != 2)

Reply via email to