With higher core count machines and support for cgroups, we are now starting to 
share nodes within slurm , but cleanup has been a challenge. We want to make 
sure if a user has 2 jobs on a shared node, we don’t inadvertently kill 
processes form the wrong job.

Does anyone know if there is a feature similar to the node access policy in 
Moab of “uniqueuser”.  This allowed for shared use of nodes, but only when the 
jobs were from different users. This made cleaning up processes in an epilog 
script similar.

If not,  how do other people clean up leftover processes on shared nodes? Do 
you use an epilog script to kill processes?  If so how to you determine which 
processes are from which jobs?

Thanks,

Naveed

Reply via email to