Would MaxCPUsPerNode set at the partition level help? Here's the snippet from the man page:
MaxCPUsPerNode Maximum number of CPUs on any node available to all jobs from this partition. This can be especially useful to schedule GPUs. For example a node can be associated with two Slurm partitions (e.g. "cpu" and "gpu") and the partition/queue "cpu" could be limited to only a subset of the node's CPUs, insuring that one or more CPUs would be available to jobs in the "gpu" partition/queue. Sent from my iPhone > On Apr 6, 2015, at 11:25 PM, Novosielski, Ryan <[email protected]> > wrote: > > I am imagine part of the reason is to keep people from running CPU jobs that > would take more than 20 cores on the GPU machine as others do not have GPU's. > I'd be interested in knowing strategies here too. > > ____ *Note: UMDNJ is now Rutgers-Biomedical and Health Sciences* > || \\UTGERS |---------------------*O*--------------------- > ||_// Biomedical | Ryan Novosielski - Senior Technologist > || \\ and Health | [email protected] 973/972.0922 (2x0922) > || \\ Sciences | OIRT/High Perf & Res Comp - MSB C630, Newark > `' > >> On Apr 6, 2015, at 20:17, Ryan Cox <[email protected]> wrote: >> >> >> Chris, >> >> Just have GPU users request the numbers of CPU cores that they need and >> don't lie to Slurm about the number of cores. If a GPU user needs 4 >> cores and 4 GPUs, have them request that. That leaves 20 cores for >> others to use. >> >> Ryan >> >>> On 04/06/2015 03:43 PM, Christopher B Coffey wrote: >>> Hello, >>> >>> I’m curious how you handle the allocation of GPU’s and cores on GPU >>> systems in your cluster. My new GPU system is 24 core, with 2 Tesla K80’s >>> (4 gpus total). We allocate cores/mem by: >>> >>> SelectType=select/cons_res >>> SelectTypeParameters=CR_Core_Memory >>> >>> >>> What I’m thinking of doing is lying to Slurm about the true cores, and >>> specifying CPUs=20, along with Gres=gpu:tesla:4. Is this a reasonable >>> solution in order to ensure there is a core reserved for each gpu in the >>> system? My thought is to allocate the 20 cores on the system to non-GPU >>> type work instead of leaving them idle. >>> >>> Thanks! >>> >>> Chris >>> >>>
