On 4/10/24 10:41 pm, archisman.pathak--- via slurm-users wrote:
In our case, that node has been removed from the cluster and cannot be
added back right now ( is being used for some other work ). What can we
do in such a case?
Mark the node as "DOWN" in Slurm, this is what we do when we get
In our case, that node has been removed from the cluster and cannot be added
back right now ( is being used for some other work ). What can we do in such a
case?
--
slurm-users mailing list -- slurm-users@lists.schedmd.com
To unsubscribe send an email to slurm-users-le...@lists.schedmd.com
Could you give more details regarding this and how you debugged the same?
--
slurm-users mailing list -- slurm-users@lists.schedmd.com
To unsubscribe send an email to slurm-users-le...@lists.schedmd.com
We have Weka filesystems on one of our clusters and saw this; we discovered we
had slightly misconfigured the weka client and the result was that Weka’s and
SLURMs cgroups were fighting with each other, and this seemed to be the result.
Fixing the weka cgroups config improved the problem, for
Usually to clear jobs like this you have to reboot the node they are on.
That will then force the scheduler to clear them.
-Paul Edmon-
On 4/10/2024 2:56 AM, archisman.pathak--- via slurm-users wrote:
We are running a slurm cluster with version `slurm 22.05.8`. One of our users
has reported