Hey everyone, sorry for digging out this old post but unfortunately I searched very intensively and didn't find any solution.. therefore my suggestion would be to add a flag for partitions like "MinCPUsPerGPUs" or something like that. Would this be something which is useful? Maybe someone has a good idea to solve my problem anyway?
Best, Felix Willenborg Am 07.03.2016 um 12:55 schrieb Felix Willenborg: > Dear Lachele, > > your suggestion is great! It would work if we'd have a complete > homogenic cluster - which is unfortunately not the case :(. All nodes > have at least two graphic cards, one has 3, another has 4. Also one Node > has a CPU with 16 cores. With MaxCPUsPerNode in slurm.conf for a > partition I'd exclude hardware which would never be used. That would be > very sad. > > Best, > Felix > > On 01.03.2016 23:29, Lachele Foley wrote: >> We do exactly that. We use the CPUs as the consumable resource rather >> than the GPUs for that reason. We also limit memory use as needed. >> You might want to see the configuration issues we ran into and solved >> as recorded in the thread at the link below. >> >> https://groups.google.com/forum/#!topic/slurm-devel/x6VaKfrdH5Y >> >> >> On Tue, Mar 1, 2016 at 1:27 PM, John Desantis <[email protected]> wrote: >>> Felix, >>> >>> Although I haven't run into a use-case like yours (yet), my initial >>> thought was to use the flag "MaxCPUsPerNode" in your configuration: >>> >>> 'Maximum number of CPUs on any node available to all jobs from this >>> partition. This can be especially useful to schedule GPUs. For >>> example a node can be associated with two Slurm partitions (e.g. >>> "cpu" and "gpu") and the partition/queue "cpu" could be limited to >>> only a subset of the node’s CPUs, insuring that one or more CPUs would >>> be available to jobs in the "gpu" partition/queue.' >>> >>> HTH, >>> John DeSantis >>> >>> >>> >>> 2016-03-01 9:05 GMT-05:00 Felix Willenborg >>> <[email protected]>: >>>> Hey folks, >>>> >>>> I'm kind of new to SLURM and we're setting it up in our work group with our >>>> nodes. Our cluster contains per node 2 GPUs and 12 CPU cores. >>>> >>>> The GPUs are configured with gres like this : >>>> Name=gpu_mem Count=6143 >>>> Name=gpu File=/dev/nvidia0 >>>> Name=gpu File=/dev/nvidia1 >>>> #Name=bandwidth count=4G >>>> (Somehow the bandwith plugin isn't available in the repository slurm and >>>> I'm >>>> getting error messages with that. That's why it's commented out. Is it even >>>> necessary?) >>>> >>>> The nodes are defined like that in the slurm.conf : >>>> [...] >>>> NodeName=node01 NodeAddr=<...> CPUs=12 RealMemory=128740 Sockets=2 >>>> CoresPerSocket=6 ThreadsPerCore=1 State=UNKNOWN >>>> Gres=gpu:3,gpu_mem:12287#,bandwidth:4G >>>> >>>> >>>> We'd like to have a situation where one CPU is always available for one GPU >>>> and only can allocated with one GPU, because we often had the situation >>>> that >>>> reservations were made where all CPUs were allocated and we couldn't use >>>> the >>>> GPUs anymore. I searched on the internet and didn't find any similiar cases >>>> which could help me. The only thing I found was adding "CPUS=0,1" at the >>>> end >>>> of every Name=gpu ... in gres.conf. Would this already do it? And if not, >>>> what can I do? I've got the feeling that I could solve my problem with >>>> SLURM >>>> in many ways. We're using SLURM version 14.11.8. >>>> >>>> Looking forward to some answers! >>>> >>>> Best wishes, >>>> Felix Willenborg >>
