Typically due to non-killable processes. Slurm will repeatedly send sigkill, but job stays in cg state. Check for processes then either reboot node or cold-start slurmd on effected nodes (leaving processes around). -- Sent from my Android phone. Please excuse my brevity and typos.
Michel Bourget <[email protected]> wrote: Hi all, what could cause a job to remain in Completing State(CG) ? It can't be killed via scancel either. Any solution for this ? I noticed it happens many times when I "play" with slurm 2.2.7. I'd speculate it seems less frequent with 2.3.3 version. TIA -- _____________________________________________ Michel Bourget - SGI - Linux Software Engineering "Past BIOS POST, everything else is extra" (travis) _____________________________________________
