I've been using Slurm on a traditional CPU compute cluster, but am now
looking at a somewhat different issue. We recently purchased a single
machine with 10 high end graphics cards to be used for CUDA calculations
and which will shared among a couple of different user groups.
Does it make sense to use Slurm for scheduling in this case? We'll want
to do things like limit the number of GPU's any one user can use and
manage resource contention the same way one would for a cluster.
Potentially this would mean running slurmctld and slurmd on the same host?
Bonus question: these research groups (they do roughly the same kind of
work) also have a pool of GPU workstations they're going to share. It
would be super cool if we could somehow rope the workstations into the
resource pool in cases where no one is working at the console. Because
some of this stuff involves steps with interactive components, the
understanding would be that all resources go to a console user when
there is a console user.
- [slurm-users] Running Slurm on a single host? Patrick Goetz