Re: [gridengine users] Multi-GPU setup

Joshua Baker-LePain Wed, 14 Aug 2019 10:01:40 -0700

On Wed, 14 Aug 2019 at 7:21am, Dj Merrill wrote

To date in our HPC Grid running Son of Grid Engine 8.1.9, we've had
single Nvidia GPU cards per compute node.  We are contemplating the
purchase of a single compute node that has multiple GPU cards in it, and
want to ensure that running jobs only have access to the GPU resources
they ask for, and don't take over all of the GPU cards in the system.

We use epilog and prolog scripts based on<https://github.com/kyamagu/sge-gpuprolog> to assign GPUs to jobs. It's(obviously) up to the users' scripts to honor the assignments, but it'sbeen working for us so far.

We define gpu as a resource:
qconf -sc:
#name               shortcut   type      relop   requestable consumable
default  urgency
gpu                 gpu        INT       <=      YES         YES    0
   0

We *used* to run this way until we ran into what seems like a bug in SoGE8.1.9. See <http://gridengine.org/pipermail/users/2018-April/010116.html>and the ensuing thread for details, but the summary is that SGE wouldinsist on trying to run a job on a particular node, even if there werefree GPUs elsewhere. It was happening so often that we had to change ourapproach, and defined a queue on each GPU node with the samenumber of slots as GPUs. It's a far from perfect system, but it's workingfor now.


--
Joshua Baker-LePain
QB3 Shared Cluster Sysadmin
UCSF
_______________________________________________
users mailing list
[email protected]
https://gridengine.org/mailman/listinfo/users

Re: [gridengine users] Multi-GPU setup

Reply via email to