Set node to drain if other jobs running. Then down and then resume. Down will kill and clear any jobs.
scontrol update nodename=xxxxxxxx state=drain reason=job_sux scontrol update nodename=xxxxxxxx state=down reason=job_sux scontrol update nodename=xxxxxxxx state=resume If it happens again either reboot or stop and restart slurm. Make sure you verify it has stopped. Doug -----Original Message----- From: Gene Soudlenkov [mailto:g.soudlen...@auckland.ac.nz] Sent: Monday, April 10, 2017 12:56 PM To: slurm-dev <slurm-dev@schedmd.com> Subject: [slurm-dev] Re: Deleting jobs in Completing state on hung nodes It happens sometimes - in our case epilogue code got stuck. Either check the processes and kill whicehver ones belong to the user or simply reboot the nodes. Cheers, Gene -- New Zealand eScience Infrastructure Centre for eResearch The University of Auckland e: g.soudlen...@auckland.ac.nz p: +64 9 3737599 ext 89834 c: +64 21 840 825 f: +64 9 373 7453 w: www.nesi.org.nz On 11/04/17 07:52, Tus wrote: > > I have 2 nodes that have hardware issues and died with jobs running on > them. I am not able to fix the nodes at the moment but want to delete > the jobs that are stuck in completing state from slurm. I have set the > nodes to DRAIN and tried scancel which did not work. > > How do I remove these jobs? > >