On Wed, 2019-08-14 at 16:35 +0200, Andreas Haupt wrote:
> Hi Dj,
> 
> we do this by setting $CUDA_VISIBLE_DEVICES in a prolog script (and
> according to what has been requested by the job).
> 
> Preventing access to the 'wrong' gpu devices by "malicious jobs" is
> not
> that easy. An idea could be to e.g. play with device permissions.


We use the same approach on our SGE 8.1.9 cluster, with consumables for
number of GPUs needed and GPU RAM required, and other requestable
attributes for GPU model, Cuda level and so on.

Fortunately, the user base is small and very cooperative, so at this
time I'm not worried about malicious users.

                                        Cheers,
                                                Chris

> 
> Cheers,
> Andreas
> 
> On Wed, 2019-08-14 at 10:21 -0400, Dj Merrill wrote:
> > To date in our HPC Grid running Son of Grid Engine 8.1.9, we've had
> > single Nvidia GPU cards per compute node.  We are contemplating the
> > purchase of a single compute node that has multiple GPU cards in
> > it, and
> > want to ensure that running jobs only have access to the GPU
> > resources
> > they ask for, and don't take over all of the GPU cards in the
> > system.
> > 
> > We define gpu as a resource:
> > qconf -sc:
> > #name               shortcut   type      relop   requestable
> > consumable
> > default  urgency
> > gpu                 gpu        INT       <=      YES         YES   
> >  0
> >     0
> > 
> > We define GPU persistence mode and exclusive process on each node:
> > nvidia-smi -pm 1
> > nvidia-smi -c 3
> > 
> > We set the number of GPUs in the host definition:
> > qconf -me (hostname)
> > 
> > complex_values   gpu=1   for our existing nodes, and this setup has
> > been
> > working fine for us.
> > 
> > With the new system, we would set:
> > complex_values   gpu=4
> > 
> > 
> > If a job is submitted asking for one GPU, will it be limited to
> > only
> > having access to a single GPU card on the system, or can it detect
> > the
> > other cards and take up all four (and if so how do we prevent
> > that)?
> > 
> > Is there something like "cgroups" for gpus?
> > 
> > Thanks,
> > 
> > -Dj
> > 
> > 
> > _______________________________________________
> > users mailing list
> > users@gridengine.org
> > https://gridengine.org/mailman/listinfo/users
> 
> --
> > Andreas Haupt            | E-Mail: andreas.ha...@desy.de
> >  DESY Zeuthen            | WWW:    
> > http://www-zeuthen.desy.de/~ahaupt
> >  Platanenallee 6         | Phone:  +49/33762/7-7359
> >  D-15738 Zeuthen         | Fax:    +49/33762/7-7216
> 
> _______________________________________________
> users mailing list
> users@gridengine.org
> https://gridengine.org/mailman/listinfo/users



_______________________________________________
users mailing list
users@gridengine.org
https://gridengine.org/mailman/listinfo/users

Reply via email to