Dear Lachele,

your suggestion is great! It would work if we'd have a complete
homogenic cluster - which is unfortunately not the case :(. All nodes
have at least two graphic cards, one has 3, another has 4. Also one Node
has a CPU with 16 cores. With MaxCPUsPerNode in slurm.conf for a
partition I'd exclude hardware which would never be used. That would be
very sad.

Best,
Felix

On 01.03.2016 23:29, Lachele Foley wrote:
> We do exactly that.  We use the CPUs as the consumable resource rather
> than the GPUs for that reason.  We also limit memory use as needed.
> You might want to see the configuration issues we ran into and solved
> as recorded in the thread at the link below.
>
> https://groups.google.com/forum/#!topic/slurm-devel/x6VaKfrdH5Y
>
>
> On Tue, Mar 1, 2016 at 1:27 PM, John Desantis <[email protected]> wrote:
>> Felix,
>>
>> Although I haven't run into a use-case like yours (yet), my initial
>> thought was to use the flag "MaxCPUsPerNode" in your configuration:
>>
>> 'Maximum number of CPUs on any node available to all jobs from this
>> partition.  This can be especially useful to schedule GPUs. For
>> example  a  node can  be  associated  with  two Slurm partitions (e.g.
>> "cpu" and "gpu") and the partition/queue "cpu" could be limited to
>> only a subset of the node’s CPUs, insuring that one or more CPUs would
>> be available to jobs in the "gpu" partition/queue.'
>>
>> HTH,
>> John DeSantis
>>
>>
>>
>> 2016-03-01 9:05 GMT-05:00 Felix Willenborg 
>> <[email protected]>:
>>> Hey folks,
>>>
>>> I'm kind of new to SLURM and we're setting it up in our work group with our
>>> nodes. Our cluster contains per node 2 GPUs and 12 CPU cores.
>>>
>>> The GPUs are configured with gres like this :
>>> Name=gpu_mem Count=6143
>>> Name=gpu File=/dev/nvidia0
>>> Name=gpu File=/dev/nvidia1
>>> #Name=bandwidth count=4G
>>> (Somehow the bandwith plugin isn't available in the repository slurm and I'm
>>> getting error messages with that. That's why it's commented out. Is it even
>>> necessary?)
>>>
>>> The nodes are defined like that in the slurm.conf :
>>> [...]
>>> NodeName=node01 NodeAddr=<...> CPUs=12 RealMemory=128740 Sockets=2
>>> CoresPerSocket=6 ThreadsPerCore=1 State=UNKNOWN
>>> Gres=gpu:3,gpu_mem:12287#,bandwidth:4G
>>>
>>>
>>> We'd like to have a situation where one CPU is always available for one GPU
>>> and only can allocated with one GPU, because we often had the situation that
>>> reservations were made where all CPUs were allocated and we couldn't use the
>>> GPUs anymore. I searched on the internet and didn't find any similiar cases
>>> which could help me. The only thing I found was adding "CPUS=0,1" at the end
>>> of every Name=gpu ... in gres.conf. Would this already do it? And if not,
>>> what can I do? I've got the feeling that I could solve my problem with SLURM
>>> in many ways. We're using SLURM version 14.11.8.
>>>
>>> Looking forward to some answers!
>>>
>>> Best wishes,
>>> Felix Willenborg
>
>

Reply via email to