Re: [gridengine users] Multi-GPU setup

2019-08-14 Thread Dj Merrill
Thanks everyone for the feedback. I found this on Github that looks promising: https://github.com/RSE-Sheffield/sge-gpuprolog and this to go with it: https://gist.github.com/willfurnass/10277756070c4f374e6149a281324841 I can probably edit the scripts to also change the permissions on the

Re: [gridengine users] Multi-GPU setup

2019-08-14 Thread Joshua Baker-LePain
On Wed, 14 Aug 2019 at 7:21am, Dj Merrill wrote To date in our HPC Grid running Son of Grid Engine 8.1.9, we've had single Nvidia GPU cards per compute node. We are contemplating the purchase of a single compute node that has multiple GPU cards in it, and want to ensure that running jobs only

Re: [gridengine users] Multi-GPU setup

2019-08-14 Thread bergman
In the message dated: Wed, 14 Aug 2019 10:21:12 -0400, The pithy ruminations from Dj Merrill on [[gridengine users] Multi-GPU setup] were: => To date in our HPC Grid running Son of Grid Engine 8.1.9, we've had => single Nvidia GPU cards per compute node. We are contemplating the => purchase of a

Re: [gridengine users] Multi-GPU setup

2019-08-14 Thread Ian Kaufman
You could probably do this using consumables and using resource quoatas to enforce them. Ian On Wed, Aug 14, 2019 at 8:34 AM Christopher Heiny wrote: > On Wed, 2019-08-14 at 16:35 +0200, Andreas Haupt wrote: > > Hi Dj, > > > > we do this by setting $CUDA_VISIBLE_DEVICES in a prolog script (and

Re: [gridengine users] Multi-GPU setup

2019-08-14 Thread Christopher Heiny
On Wed, 2019-08-14 at 16:35 +0200, Andreas Haupt wrote: > Hi Dj, > > we do this by setting $CUDA_VISIBLE_DEVICES in a prolog script (and > according to what has been requested by the job). > > Preventing access to the 'wrong' gpu devices by "malicious jobs" is > not > that easy. An idea could be

Re: [gridengine users] Multi-GPU setup

2019-08-14 Thread Friedrich Ferstl
Yes, UGE supports this out of the box. Depending on whether the job is a regular job or a Docker container the method used to restrict access only to the assigned GPU is slightly different. UGE also will only schedule jobs to nodes where it is guaranteed to be able doing this. The interface

Re: [gridengine users] Multi-GPU setup

2019-08-14 Thread Tina Friedrich
Hello, from a kernel/mechanism point of view, it is perfectly possible to restrict device access using cgroups. I use that on my current cluster, works really well (both for things like CPU cores and GPUs - you only see what you request, even using something like 'nvidia-smi'). Sadly, my

Re: [gridengine users] Multi-GPU setup

2019-08-14 Thread Nicolas FOURNIALS
Hi, Le 14/08/2019 à 16:35, Andreas Haupt a écrit : Preventing access to the 'wrong' gpu devices by "malicious jobs" is not that easy. An idea could be to e.g. play with device permissions. That's what we do by having /dev/nvidia[0-n] files owned by root and with permissions 660. Prolog

Re: [gridengine users] Multi-GPU setup

2019-08-14 Thread Skylar Thompson
Hi DJ, I'm not sure if SoGE supports it, but UGE has the concept of "resource maps" (aka RSMAP) complexes which we use to assign specific hardware resources to specific jobs. It functions sort of as a hybrid array/scalar consumable. It looks like this in the host complex_values configuration:

Re: [gridengine users] Multi-GPU setup

2019-08-14 Thread Andreas Haupt
Hi Dj, we do this by setting $CUDA_VISIBLE_DEVICES in a prolog script (and according to what has been requested by the job). Preventing access to the 'wrong' gpu devices by "malicious jobs" is not that easy. An idea could be to e.g. play with device permissions. Cheers, Andreas On Wed,

[gridengine users] Multi-GPU setup

2019-08-14 Thread Dj Merrill
To date in our HPC Grid running Son of Grid Engine 8.1.9, we've had single Nvidia GPU cards per compute node. We are contemplating the purchase of a single compute node that has multiple GPU cards in it, and want to ensure that running jobs only have access to the GPU resources they ask for, and