[slurm-dev] Re: GPU node allocation policy

Christopher B Coffey Thu, 09 Apr 2015 11:19:26 -0700

Hi Ryan and others,

Thank you - those are some good ideas.


Experimenting a bit more, it does look like apps such as NAMD can take
advantage of more than one processor/GPU.  So my idea initial thought
would be too restrictive.


We don’t have any preemption going on currently.  But, I’m leaning towards
doing the following:

1. Create GPU partition, nodes in this partition are also in the “all”
partition.  The GPU partition has a lower wall time, like a day or so vs
all.
2. Create GPU QoS that has high priority points added, like our debug QoS.
 Limit access to GPU QoS for folks actually utilizing GPU’s to prevent
abuse
3. Use the job_submit_all_partition plugin to attempt to send any job to
all of the partitions.  Jobs with a --time <= to GPU partition wall time
could land there.  Another great reason for users to specify short —time,
quicker starts for their jobs, and access to more resources.
4. LUA submission script if it finds “--gres=gpu*”, will tack onto the job
--qos=gpu which, slurm will add additional priority points to the job

This should help address the situation of ensuring high utilization of the
cpu cores of the gpu nodes while still attempting to provide quicker
access to the gpu’s for gpu jobs.

I think this should work nicely.  Anyone see an issue?  Thanks!

Chris



On 4/7/15, 7:39 AM, "Ryan Cox" <[email protected]> wrote:

>
>You can do something like this: JobSubmitPlugins=all_partitions,lua.
>Have a special empty partition, as you suggest.  Use the submit plugin
>to detect if the empty partition is in there.  If it is in the job's
>list of partitions, you know that the user didn't specify a particular
>partition.  If it is not in the list, you know that the user requested a
>particular partition (or set of partitions).  You can then do all sorts
>of fun logic.
>
>Does all the GPU code in question need only one CPU core?  Some of our
>users have code that can use multiple CPUs and multiple GPUs
>simultaneously (LAMMPS? NAMD?  I'd have to check...).  It might be
>limiting to restrict users to a certain amount of cores.  If you're
>scheduling memory, it's also important to make sure that there is some
>memory available for the GPU jobs.
>
>What we do is uses QOSs to control access to our GPU partition with
>AllowQos.  We use a job submit plugin to place jobs with the appropriate
>GRES into the gpu QOS, which is allowed into that partition.  We also
>allow jobs in a preemptable QOS into the partition, with the gpu QOS
>able to preempt jobs in the preemptible QOS.  We could also do a shorter
>walltime QOS or something with a lower priority but haven't done so; GPU
>jobs could get on there quickly even if all-CPU jobs are on there.  They
>could also have the job submit plugin add the gpu partition into their
>list of partitions if the job meets certain criteria even if not
>requesting GPUs (short walltime or something else).  Just some thoughts.
>
>Ryan
>
>On 04/07/2015 07:47 AM, Aaron Knister wrote:
>> Ah, I was wondering about that. You could try this:
>>
>> Rename standard partition to cpu1
>> Create a partition called standard with no nodes
>> Use the lua submit plugin to rewrite the partition list from standard
>>to cpu1,cpufromgpunode
>>
>> I *think* that will work. I'm not sure about the empty partition piece
>>and whether that will deny your submission before the submit filter
>>kicks in but my gut says no.
>>
>> Sent from my iPhone
>>
>>> On Apr 7, 2015, at 9:18 AM, Schmidtmann, Carl
>>><[email protected]> wrote:
>>>
>>> That only works if ALL the nodes have GPUs. We have 200+ nodes and 30
>>>of them have GPUs. So we have to create three partitions - standard,
>>>gpu and  cpufromgpunode. People in the standard partition can’t use the
>>>cpus on the gpu nodes. People that submit to the cpufromgpunode can’t
>>>use the cpus in the standard partition. We would like to see a way to
>>>specify MaxCPUsPerJobOnThisNode so the standard partition can use 24
>>>cores on nodes without a GPU and less on nodes with a GPU. Or a way to
>>>specify ReserveCPUForGPU on the node or some such thing. I assume this
>>>is difficult because people have asked for it but it hasn’t been
>>>implemented.
>>>
>>> Carl
>>>
>>> Carl Schmidtmann
>>> Center for Integrated Research Computing
>>> University of Rochester
>>>
>>>
>>>
>>>
>>>
>>>> On Apr 7, 2015, at 4:51 AM, Aaron Knister <[email protected]>
>>>>wrote:
>>>>
>>>> Would MaxCPUsPerNode set at the partition level help?
>>>>
>>>> Here's the snippet from the man page:
>>>>
>>>> MaxCPUsPerNode
>>>> Maximum number of CPUs on any node available to all jobs from this
>>>>partition. This can be especially useful to schedule GPUs. For example
>>>>a node can be associated with two Slurm partitions (e.g. "cpu" and
>>>>"gpu") and the partition/queue "cpu" could be limited to only a subset
>>>>of the node's CPUs, insuring that one or more CPUs would be available
>>>>to jobs in the "gpu" partition/queue.
>>>>
>>>> Sent from my iPhone
>>>>
>>>>> On Apr 6, 2015, at 11:25 PM, Novosielski, Ryan
>>>>><[email protected]> wrote:
>>>>>
>>>>> I am imagine part of the reason is to keep people from running CPU
>>>>>jobs that would take more than 20 cores on the GPU machine as others
>>>>>do not have GPU's. I'd be interested in knowing strategies here too.
>>>>>
>>>>> ____ *Note: UMDNJ is now Rutgers-Biomedical and Health Sciences*
>>>>> || \\UTGERS      |---------------------*O*---------------------
>>>>> ||_// Biomedical | Ryan Novosielski - Senior Technologist
>>>>> || \\ and Health | [email protected] 973/972.0922 (2x0922)
>>>>> ||  \\  Sciences | OIRT/High Perf & Res Comp - MSB C630, Newark
>>>>>     `'
>>>>>
>>>>>> On Apr 6, 2015, at 20:17, Ryan Cox <[email protected]> wrote:
>>>>>>
>>>>>>
>>>>>> Chris,
>>>>>>
>>>>>> Just have GPU users request the numbers of CPU cores that they need
>>>>>>and
>>>>>> don't lie to Slurm about the number of cores.  If a GPU user needs 4
>>>>>> cores and 4 GPUs, have them request that.  That leaves 20 cores for
>>>>>> others to use.
>>>>>>
>>>>>> Ryan
>>>>>>
>>>>>>> On 04/06/2015 03:43 PM, Christopher B Coffey wrote:
>>>>>>> Hello,
>>>>>>>
>>>>>>> I’m curious how you handle the allocation of GPU’s and cores on GPU
>>>>>>> systems in your cluster.  My new GPU system is 24 core, with 2
>>>>>>>Tesla K80’s
>>>>>>> (4 gpus total).  We allocate cores/mem by:
>>>>>>>
>>>>>>> SelectType=select/cons_res
>>>>>>> SelectTypeParameters=CR_Core_Memory
>>>>>>>
>>>>>>>
>>>>>>> What I’m thinking of doing is lying to Slurm about the true cores,
>>>>>>>and
>>>>>>> specifying CPUs=20, along with Gres=gpu:tesla:4.  Is this a
>>>>>>>reasonable
>>>>>>> solution in order to ensure there is a core reserved for each gpu
>>>>>>>in the
>>>>>>> system?  My thought is to allocate the 20 cores on the system to
>>>>>>>non-GPU
>>>>>>> type work instead of leaving them idle.
>>>>>>>
>>>>>>> Thanks!
>>>>>>>
>>>>>>> Chris
>
>-- 
>Ryan Cox
>Operations Director
>Fulton Supercomputing Lab
>Brigham Young University

[slurm-dev] Re: GPU node allocation policy

Reply via email to