We aren't doing that. I agree we probably should. If you work out a config, and don't mind doing so, please share it.
On Thu, Mar 3, 2016 at 3:11 AM, Daniel Letai <[email protected]> wrote: > Correct me if I'm wrong, but I don't see any NUMA based reservation of the > CPUs - Do you ensure that each reserved cpu is from a different socket, and > GPU jobs affinity is to correct NUMA node? > > > On 03/02/2016 12:30 AM, Lachele Foley wrote: > > We do exactly that. We use the CPUs as the consumable resource rather > than the GPUs for that reason. We also limit memory use as needed. > You might want to see the configuration issues we ran into and solved > as recorded in the thread at the link below. > > https://groups.google.com/forum/#!topic/slurm-devel/x6VaKfrdH5Y > > > On Tue, Mar 1, 2016 at 1:27 PM, John Desantis <[email protected]> wrote: > > Felix, > > Although I haven't run into a use-case like yours (yet), my initial > thought was to use the flag "MaxCPUsPerNode" in your configuration: > > 'Maximum number of CPUs on any node available to all jobs from this > partition. This can be especially useful to schedule GPUs. For > example a node can be associated with two Slurm partitions (e.g. > "cpu" and "gpu") and the partition/queue "cpu" could be limited to > only a subset of the node’s CPUs, insuring that one or more CPUs would > be available to jobs in the "gpu" partition/queue.' > > HTH, > John DeSantis > > > > 2016-03-01 9:05 GMT-05:00 Felix Willenborg > <[email protected]>: > > Hey folks, > > I'm kind of new to SLURM and we're setting it up in our work group with our > nodes. Our cluster contains per node 2 GPUs and 12 CPU cores. > > The GPUs are configured with gres like this : > Name=gpu_mem Count=6143 > Name=gpu File=/dev/nvidia0 > Name=gpu File=/dev/nvidia1 > #Name=bandwidth count=4G > (Somehow the bandwith plugin isn't available in the repository slurm and I'm > getting error messages with that. That's why it's commented out. Is it even > necessary?) > > The nodes are defined like that in the slurm.conf : > [...] > NodeName=node01 NodeAddr=<...> CPUs=12 RealMemory=128740 Sockets=2 > CoresPerSocket=6 ThreadsPerCore=1 State=UNKNOWN > Gres=gpu:3,gpu_mem:12287#,bandwidth:4G > > > We'd like to have a situation where one CPU is always available for one GPU > and only can allocated with one GPU, because we often had the situation that > reservations were made where all CPUs were allocated and we couldn't use the > GPUs anymore. I searched on the internet and didn't find any similiar cases > which could help me. The only thing I found was adding "CPUS=0,1" at the end > of every Name=gpu ... in gres.conf. Would this already do it? And if not, > what can I do? I've got the feeling that I could solve my problem with SLURM > in many ways. We're using SLURM version 14.11.8. > > Looking forward to some answers! > > Best wishes, > Felix Willenborg > > > -- :-) Lachele Lachele Foley CCRC/UGA Athens, GA USA
