Hi Folks ~
A couple (hopefully) simple questions; I can't find anything that
obviously / easily solves these in the man pages. I have a fairly ordinary
deployment in which scheduling is done by core so some high-memory systems can
be shared.
- Users have observed that sometimes jobs are being moved from one node
to another while running. This makes the particular tool being used unhappy. Is
there a way to prevent this either with a flag or config file entry?
- When scheduling by core the default behavior seems to be to fill up
the first node with tasks, then move to the second, etc. Since memory is being
shared between tasks it would be preferable to select a node on which no other
jobs (or the minimum number of other jobs) are running before piling onto a
node already running a job(s). How can a tell SLURM the equivalent of "pick an
unused node first if available".
Thanks,
~Mike C.