Slurm can't kill the process, so does not reallocate those resources. See:
http://slurm.schedmd.com/troubleshoot.html#completing
Quoting Michael Colonno <[email protected]>:
Hi ~
I've run into this issue with several different versions (currently
14.0.3) and I've never been able to find a root cause: Sometimes,
usually when I job is canceled, the job(s) enter state "CG" and the
corresponding nodes enter state "comp" or oscillate between "comp"
and "comp*". The slurm logs show a cancelation of a job but no other
errors or issues. This zombie state persists indefinitely. An admin
has to either manually restart the slurm process on the affected
nodes and set their state to idle to bring them back or, in some
cases, force-kill the process ID to stop the slurm process. Changing
the timeout setting in the config file does not seem to have any
effect. I am planning on updating versions to the latest but is
there anything I can do to prevent or circumvent this?
Thanks,
~Mike C.
--
Morris "Moe" Jette
CTO, SchedMD LLC
Commercial Slurm Development and Support