Dear Lachele, your suggestion is great! It would work if we'd have a complete homogenic cluster - which is unfortunately not the case :(. All nodes have at least two graphic cards, one has 3, another has 4. Also one Node has a CPU with 16 cores. With MaxCPUsPerNode in slurm.conf for a partition I'd exclude hardware which would never be used. That would be very sad.
Best, Felix On 01.03.2016 23:29, Lachele Foley wrote: > We do exactly that. We use the CPUs as the consumable resource rather > than the GPUs for that reason. We also limit memory use as needed. > You might want to see the configuration issues we ran into and solved > as recorded in the thread at the link below. > > https://groups.google.com/forum/#!topic/slurm-devel/x6VaKfrdH5Y > > > On Tue, Mar 1, 2016 at 1:27 PM, John Desantis <[email protected]> wrote: >> Felix, >> >> Although I haven't run into a use-case like yours (yet), my initial >> thought was to use the flag "MaxCPUsPerNode" in your configuration: >> >> 'Maximum number of CPUs on any node available to all jobs from this >> partition. This can be especially useful to schedule GPUs. For >> example a node can be associated with two Slurm partitions (e.g. >> "cpu" and "gpu") and the partition/queue "cpu" could be limited to >> only a subset of the node’s CPUs, insuring that one or more CPUs would >> be available to jobs in the "gpu" partition/queue.' >> >> HTH, >> John DeSantis >> >> >> >> 2016-03-01 9:05 GMT-05:00 Felix Willenborg >> <[email protected]>: >>> Hey folks, >>> >>> I'm kind of new to SLURM and we're setting it up in our work group with our >>> nodes. Our cluster contains per node 2 GPUs and 12 CPU cores. >>> >>> The GPUs are configured with gres like this : >>> Name=gpu_mem Count=6143 >>> Name=gpu File=/dev/nvidia0 >>> Name=gpu File=/dev/nvidia1 >>> #Name=bandwidth count=4G >>> (Somehow the bandwith plugin isn't available in the repository slurm and I'm >>> getting error messages with that. That's why it's commented out. Is it even >>> necessary?) >>> >>> The nodes are defined like that in the slurm.conf : >>> [...] >>> NodeName=node01 NodeAddr=<...> CPUs=12 RealMemory=128740 Sockets=2 >>> CoresPerSocket=6 ThreadsPerCore=1 State=UNKNOWN >>> Gres=gpu:3,gpu_mem:12287#,bandwidth:4G >>> >>> >>> We'd like to have a situation where one CPU is always available for one GPU >>> and only can allocated with one GPU, because we often had the situation that >>> reservations were made where all CPUs were allocated and we couldn't use the >>> GPUs anymore. I searched on the internet and didn't find any similiar cases >>> which could help me. The only thing I found was adding "CPUS=0,1" at the end >>> of every Name=gpu ... in gres.conf. Would this already do it? And if not, >>> what can I do? I've got the feeling that I could solve my problem with SLURM >>> in many ways. We're using SLURM version 14.11.8. >>> >>> Looking forward to some answers! >>> >>> Best wishes, >>> Felix Willenborg > >
