You could probably do this using consumables and using resource quoatas to enforce them.
Ian On Wed, Aug 14, 2019 at 8:34 AM Christopher Heiny <che...@synaptics.com> wrote: > On Wed, 2019-08-14 at 16:35 +0200, Andreas Haupt wrote: > > Hi Dj, > > > > we do this by setting $CUDA_VISIBLE_DEVICES in a prolog script (and > > according to what has been requested by the job). > > > > Preventing access to the 'wrong' gpu devices by "malicious jobs" is > > not > > that easy. An idea could be to e.g. play with device permissions. > > > We use the same approach on our SGE 8.1.9 cluster, with consumables for > number of GPUs needed and GPU RAM required, and other requestable > attributes for GPU model, Cuda level and so on. > > Fortunately, the user base is small and very cooperative, so at this > time I'm not worried about malicious users. > > Cheers, > Chris > > > > > Cheers, > > Andreas > > > > On Wed, 2019-08-14 at 10:21 -0400, Dj Merrill wrote: > > > To date in our HPC Grid running Son of Grid Engine 8.1.9, we've had > > > single Nvidia GPU cards per compute node. We are contemplating the > > > purchase of a single compute node that has multiple GPU cards in > > > it, and > > > want to ensure that running jobs only have access to the GPU > > > resources > > > they ask for, and don't take over all of the GPU cards in the > > > system. > > > > > > We define gpu as a resource: > > > qconf -sc: > > > #name shortcut type relop requestable > > > consumable > > > default urgency > > > gpu gpu INT <= YES YES > > > 0 > > > 0 > > > > > > We define GPU persistence mode and exclusive process on each node: > > > nvidia-smi -pm 1 > > > nvidia-smi -c 3 > > > > > > We set the number of GPUs in the host definition: > > > qconf -me (hostname) > > > > > > complex_values gpu=1 for our existing nodes, and this setup has > > > been > > > working fine for us. > > > > > > With the new system, we would set: > > > complex_values gpu=4 > > > > > > > > > If a job is submitted asking for one GPU, will it be limited to > > > only > > > having access to a single GPU card on the system, or can it detect > > > the > > > other cards and take up all four (and if so how do we prevent > > > that)? > > > > > > Is there something like "cgroups" for gpus? > > > > > > Thanks, > > > > > > -Dj > > > > > > > > > _______________________________________________ > > > users mailing list > > > users@gridengine.org > > > https://gridengine.org/mailman/listinfo/users > > > > -- > > > Andreas Haupt | E-Mail: andreas.ha...@desy.de > > > DESY Zeuthen | WWW: > > > http://www-zeuthen.desy.de/~ahaupt > > > Platanenallee 6 | Phone: +49/33762/7-7359 > > > D-15738 Zeuthen | Fax: +49/33762/7-7216 > > > > _______________________________________________ > > users mailing list > > users@gridengine.org > > https://gridengine.org/mailman/listinfo/users > > > > _______________________________________________ > users mailing list > users@gridengine.org > https://gridengine.org/mailman/listinfo/users > -- Ian Kaufman Research Systems Administrator UC San Diego, Jacobs School of Engineering ikaufman AT ucsd DOT edu
_______________________________________________ users mailing list users@gridengine.org https://gridengine.org/mailman/listinfo/users