body p { margin-bottom: 0cm; margin-top: 0pt; }
Correct me if I'm wrong, but I don't see any NUMA based reservation
of the CPUs - Do you ensure that each reserved cpu is from a
different socket, and GPU jobs affinity is to correct NUMA node?
On 03/02/2016 12:30 AM, Lachele Foley
wrote:
We do exactly that. We use the CPUs as the consumable resource rather
than the GPUs for that reason. We also limit memory use as needed.
You might want to see the configuration issues we ran into and solved
as recorded in the thread at the link below.
https://groups.google.com/forum/#!topic/slurm-devel/x6VaKfrdH5Y
On Tue, Mar 1, 2016 at 1:27 PM, John Desantis <[email protected]> wrote:
Felix,
Although I haven't run into a use-case like yours (yet), my initial
thought was to use the flag "MaxCPUsPerNode" in your configuration:
'Maximum number of CPUs on any node available to all jobs from this
partition. This can be especially useful to schedule GPUs. For
example a node can be associated with two Slurm partitions (e.g.
"cpu" and "gpu") and the partition/queue "cpu" could be limited to
only a subset of the node’s CPUs, insuring that one or more CPUs would
be available to jobs in the "gpu" partition/queue.'
HTH,
John DeSantis
2016-03-01 9:05 GMT-05:00 Felix Willenborg <[email protected]>:
Hey folks,
I'm kind of new to SLURM and we're setting it up in our work group with our
nodes. Our cluster contains per node 2 GPUs and 12 CPU cores.
The GPUs are configured with gres like this :
Name=gpu_mem Count=6143
Name=gpu File=/dev/nvidia0
Name=gpu File=/dev/nvidia1
#Name=bandwidth count=4G
(Somehow the bandwith plugin isn't available in the repository slurm and I'm
getting error messages with that. That's why it's commented out. Is it even
necessary?)
The nodes are defined like that in the slurm.conf :
[...]
NodeName=node01 NodeAddr=<...> CPUs=12 RealMemory=128740 Sockets=2
CoresPerSocket=6 ThreadsPerCore=1 State=UNKNOWN
Gres=gpu:3,gpu_mem:12287#,bandwidth:4G
We'd like to have a situation where one CPU is always available for one GPU
and only can allocated with one GPU, because we often had the situation that
reservations were made where all CPUs were allocated and we couldn't use the
GPUs anymore. I searched on the internet and didn't find any similiar cases
which could help me. The only thing I found was adding "CPUS=0,1" at the end
of every Name=gpu ... in gres.conf. Would this already do it? And if not,
what can I do? I've got the feeling that I could solve my problem with SLURM
in many ways. We're using SLURM version 14.11.8.
Looking forward to some answers!
Best wishes,
Felix Willenborg