Hi,

Am 21.12.2017 um 22:46 schrieb berg...@merctech.com:

> In our cluster, we've got several different types of GPUs.
> 
> Some jobs simply need any GPU, while others require a specific type.
> 
> Previously, we had "gpu" declared as a BOOLEAN attribute on each GPU-node
> and had the GPU type (ie., TITANX, P100, etc) declared as an INT attribute
> with the count of that number of GPUs per node.
> 
> For example:
> 
>       qconf -aattr exechost complex_values gpu=TRUE,TITANX=1 node1
>       qconf -aattr exechost complex_values gpu=TRUE,TITANX=1 node2
>       qconf -aattr exechost complex_values gpu=TRUE,P100=2 node3
>       qconf -aattr exechost complex_values gpu=TRUE,P40=1 node4
> 
> A user could submit:
>       qsub -l gpu myjob
> and it could run on any of the nodes, or a user could run:
>       qsub -l TITANX=1 myjob
> and it could run on node1 or node2.
> 
> However... this lead to over-subscription as the 'gpu' BOOLEAN isn't a
> consumable resource.
> 
> I'm considering changing "gpu" to an INT (set to the number of GPUs/node),
> making it a consumable resource, and updating our JSV (in perl) so that
> if the job is submitted as
> 
>       qsub -l gpu foobar
> 
> it will be altered to the equivalent of
> 
>       qsub -l gpu=1 foobar
> 
> to keep things easy for users.
> 
> Any suggestions about this plan?

Even with "-w n" you will face a "missing value for request" I fear, as it's 
AFAIK checked before the JSV will be called*. I had the idea in the past to 
change the default value for an integer request without a number to one (it's 
quiet easy to find in the source where the BOOL without a value is expanded) 
but it was denied.

But: do you need to know which GPU will be used? Univa GE has a named resource. 
With SGE it might help to have one queue with one slot per GPU, and from the 
name (i.e. suffix) of the granted queue name you know which GPU you have to use.

-- Reuti

*) The "-w e" check will even be performed twice: one time before the JSV and 
one time after. This is to my opinion not optimal, as it prohibits to submit a 
completely malformed request and put things in order inside the JSV. Sure, one 
problem are the fields which are feed to the JSV. How to express a missing 
integer value (besides the IEEE ways like NaN and alike).


> 
> Thanks,
> 
> Mark
> _______________________________________________
> users mailing list
> users@gridengine.org
> https://gridengine.org/mailman/listinfo/users


_______________________________________________
users mailing list
users@gridengine.org
https://gridengine.org/mailman/listinfo/users

Reply via email to