You could probably do this using consumables and using resource quoatas to
enforce them.

Ian

On Wed, Aug 14, 2019 at 8:34 AM Christopher Heiny <che...@synaptics.com>
wrote:

> On Wed, 2019-08-14 at 16:35 +0200, Andreas Haupt wrote:
> > Hi Dj,
> >
> > we do this by setting $CUDA_VISIBLE_DEVICES in a prolog script (and
> > according to what has been requested by the job).
> >
> > Preventing access to the 'wrong' gpu devices by "malicious jobs" is
> > not
> > that easy. An idea could be to e.g. play with device permissions.
>
>
> We use the same approach on our SGE 8.1.9 cluster, with consumables for
> number of GPUs needed and GPU RAM required, and other requestable
> attributes for GPU model, Cuda level and so on.
>
> Fortunately, the user base is small and very cooperative, so at this
> time I'm not worried about malicious users.
>
>                                         Cheers,
>                                                 Chris
>
> >
> > Cheers,
> > Andreas
> >
> > On Wed, 2019-08-14 at 10:21 -0400, Dj Merrill wrote:
> > > To date in our HPC Grid running Son of Grid Engine 8.1.9, we've had
> > > single Nvidia GPU cards per compute node.  We are contemplating the
> > > purchase of a single compute node that has multiple GPU cards in
> > > it, and
> > > want to ensure that running jobs only have access to the GPU
> > > resources
> > > they ask for, and don't take over all of the GPU cards in the
> > > system.
> > >
> > > We define gpu as a resource:
> > > qconf -sc:
> > > #name               shortcut   type      relop   requestable
> > > consumable
> > > default  urgency
> > > gpu                 gpu        INT       <=      YES         YES
> > >  0
> > >     0
> > >
> > > We define GPU persistence mode and exclusive process on each node:
> > > nvidia-smi -pm 1
> > > nvidia-smi -c 3
> > >
> > > We set the number of GPUs in the host definition:
> > > qconf -me (hostname)
> > >
> > > complex_values   gpu=1   for our existing nodes, and this setup has
> > > been
> > > working fine for us.
> > >
> > > With the new system, we would set:
> > > complex_values   gpu=4
> > >
> > >
> > > If a job is submitted asking for one GPU, will it be limited to
> > > only
> > > having access to a single GPU card on the system, or can it detect
> > > the
> > > other cards and take up all four (and if so how do we prevent
> > > that)?
> > >
> > > Is there something like "cgroups" for gpus?
> > >
> > > Thanks,
> > >
> > > -Dj
> > >
> > >
> > > _______________________________________________
> > > users mailing list
> > > users@gridengine.org
> > > https://gridengine.org/mailman/listinfo/users
> >
> > --
> > > Andreas Haupt            | E-Mail: andreas.ha...@desy.de
> > >  DESY Zeuthen            | WWW:
> > > http://www-zeuthen.desy.de/~ahaupt
> > >  Platanenallee 6         | Phone:  +49/33762/7-7359
> > >  D-15738 Zeuthen         | Fax:    +49/33762/7-7216
> >
> > _______________________________________________
> > users mailing list
> > users@gridengine.org
> > https://gridengine.org/mailman/listinfo/users
>
>
>
> _______________________________________________
> users mailing list
> users@gridengine.org
> https://gridengine.org/mailman/listinfo/users
>


-- 
Ian Kaufman
Research Systems Administrator
UC San Diego, Jacobs School of Engineering ikaufman AT ucsd DOT edu
_______________________________________________
users mailing list
users@gridengine.org
https://gridengine.org/mailman/listinfo/users

Reply via email to