The thing you are describing is possible in both theory and practice. Plenty of people use a scheduler on a single large host. The challenge will be in enforcing user practices so they don't just run commands directly but through the scheduler.
On Fri, Apr 6, 2018 at 10:00 AM, Patrick Goetz <pgo...@math.utexas.edu> wrote: > I've been using Slurm on a traditional CPU compute cluster, but am now > looking at a somewhat different issue. We recently purchased a single > machine with 10 high end graphics cards to be used for CUDA calculations > and which will shared among a couple of different user groups. > > Does it make sense to use Slurm for scheduling in this case? We'll want > to do things like limit the number of GPU's any one user can use and manage > resource contention the same way one would for a cluster. Potentially this > would mean running slurmctld and slurmd on the same host? > > Bonus question: these research groups (they do roughly the same kind of > work) also have a pool of GPU workstations they're going to share. It > would be super cool if we could somehow rope the workstations into the > resource pool in cases where no one is working at the console. Because some > of this stuff involves steps with interactive components, the understanding > would be that all resources go to a console user when there is a console > user. > > > > >