Hello Brian, I apologize if this was more a general Linux question. But your recommendations on managing login nodes were useful.
Thanks, Durai On Mon, Jul 19, 2021 at 7:27 PM Brian Andrus <[email protected]> wrote: > Not really a slurm question, but here's my 2 cents: > > FWIW, if they are true zombies (PPID 1 and kill -9 will not work) you > can only get rid of them with a reboot. > > If they aren't eating much in the line of resources, you will want to > just ignore them until your next maintenance and then reboot. > > This is one of the reasons I do not architect login nodes to allow > access to applications or much of anything. Minimal everything. > > If your login node gets quite a bit of traffic, you should look at > setting up a load-balanced HA configuration for them. Users should not > have much of anything going on with a login node. Just submit your job > and do your work on the node. Even if it is an interactive job. Keeps > your dev/test environment the same as the runtime environment. > > Brian Andrus > > On 7/19/2021 7:09 AM, Durai Arasan wrote: > > Hello, > > > > One of our slurm user's account is hung with uninterruptible > > processes. These processes cannot be killed. Hence a restart is > > required. Is it possible to restart the user's login environment > > alone? I would like to not restart the entire login node. > > > > Thanks! > > Durai > > Max Planck Institute Tübingen > >
