Re: [gridengine users] resource types -- changing BOOL to INT but keeping qsub unchanged

2017-12-21 Thread Reuti
Hi,

Am 21.12.2017 um 22:46 schrieb berg...@merctech.com:

> In our cluster, we've got several different types of GPUs.
> 
> Some jobs simply need any GPU, while others require a specific type.
> 
> Previously, we had "gpu" declared as a BOOLEAN attribute on each GPU-node
> and had the GPU type (ie., TITANX, P100, etc) declared as an INT attribute
> with the count of that number of GPUs per node.
> 
> For example:
> 
>   qconf -aattr exechost complex_values gpu=TRUE,TITANX=1 node1
>   qconf -aattr exechost complex_values gpu=TRUE,TITANX=1 node2
>   qconf -aattr exechost complex_values gpu=TRUE,P100=2 node3
>   qconf -aattr exechost complex_values gpu=TRUE,P40=1 node4
> 
> A user could submit:
>   qsub -l gpu myjob
> and it could run on any of the nodes, or a user could run:
>   qsub -l TITANX=1 myjob
> and it could run on node1 or node2.
> 
> However... this lead to over-subscription as the 'gpu' BOOLEAN isn't a
> consumable resource.
> 
> I'm considering changing "gpu" to an INT (set to the number of GPUs/node),
> making it a consumable resource, and updating our JSV (in perl) so that
> if the job is submitted as
> 
>   qsub -l gpu foobar
> 
> it will be altered to the equivalent of
> 
>   qsub -l gpu=1 foobar
> 
> to keep things easy for users.
> 
> Any suggestions about this plan?

Even with "-w n" you will face a "missing value for request" I fear, as it's 
AFAIK checked before the JSV will be called*. I had the idea in the past to 
change the default value for an integer request without a number to one (it's 
quiet easy to find in the source where the BOOL without a value is expanded) 
but it was denied.

But: do you need to know which GPU will be used? Univa GE has a named resource. 
With SGE it might help to have one queue with one slot per GPU, and from the 
name (i.e. suffix) of the granted queue name you know which GPU you have to use.

-- Reuti

*) The "-w e" check will even be performed twice: one time before the JSV and 
one time after. This is to my opinion not optimal, as it prohibits to submit a 
completely malformed request and put things in order inside the JSV. Sure, one 
problem are the fields which are feed to the JSV. How to express a missing 
integer value (besides the IEEE ways like NaN and alike).


> 
> Thanks,
> 
> Mark
> ___
> users mailing list
> users@gridengine.org
> https://gridengine.org/mailman/listinfo/users


___
users mailing list
users@gridengine.org
https://gridengine.org/mailman/listinfo/users


[gridengine users] resource types -- changing BOOL to INT but keeping qsub unchanged

2017-12-21 Thread bergman
In our cluster, we've got several different types of GPUs.

Some jobs simply need any GPU, while others require a specific type.

Previously, we had "gpu" declared as a BOOLEAN attribute on each GPU-node
and had the GPU type (ie., TITANX, P100, etc) declared as an INT attribute
with the count of that number of GPUs per node.

For example:

qconf -aattr exechost complex_values gpu=TRUE,TITANX=1 node1
qconf -aattr exechost complex_values gpu=TRUE,TITANX=1 node2
qconf -aattr exechost complex_values gpu=TRUE,P100=2 node3
qconf -aattr exechost complex_values gpu=TRUE,P40=1 node4

A user could submit:
qsub -l gpu myjob
and it could run on any of the nodes, or a user could run:
qsub -l TITANX=1 myjob
and it could run on node1 or node2.

However... this lead to over-subscription as the 'gpu' BOOLEAN isn't a
consumable resource.

I'm considering changing "gpu" to an INT (set to the number of GPUs/node),
making it a consumable resource, and updating our JSV (in perl) so that
if the job is submitted as

qsub -l gpu foobar

it will be altered to the equivalent of

qsub -l gpu=1 foobar

to keep things easy for users.

Any suggestions about this plan?

Thanks,

Mark
___
users mailing list
users@gridengine.org
https://gridengine.org/mailman/listinfo/users