Not really a slurm question, but here's my 2 cents:

FWIW, if they are true zombies (PPID 1 and kill -9 will not work) you can only get rid of them with a reboot.

If they aren't eating much in the line of resources, you will want to just ignore them until your next maintenance and then reboot.

This is one of the reasons I do not architect login nodes to allow access to applications or much of anything. Minimal everything.

If your login node gets quite a bit of traffic, you should look at setting up a load-balanced HA configuration for them. Users should not have much of anything going on with a login node. Just submit your job and do your work on the node. Even if it is an interactive job. Keeps your dev/test environment the same as the runtime environment.

Brian Andrus

On 7/19/2021 7:09 AM, Durai Arasan wrote:
Hello,

One of our slurm user's account is hung with uninterruptible processes. These processes cannot be killed. Hence a restart is required. Is it possible to restart the user's login environment alone? I would like to not restart the entire login node.

Thanks!
Durai
Max Planck Institute Tübingen

Reply via email to