Re: [gridengine users] Multi-GPU setup

2019-08-28 Thread Hay, William
On Wed, Aug 14, 2019 at 05:11:02PM +0200, Nicolas FOURNIALS wrote: > Hi, > > Le 14/08/2019 à 16:35, Andreas Haupt a écrit : > > Preventing access to the 'wrong' gpu devices by "malicious jobs" is not > > that easy. An idea could be to e.g. play with device permissions. > > That's what we do by ha

Re: [gridengine users] Multi-GPU setup

2019-08-20 Thread Dj Merrill
Apologies, I should have followed up on this. It looks like they've already started work on handling the NVidia device permissions. Look under the branches section, and there are useful notes in both the "hardened" and "nvidia_dev_chgrp" branches. https://github.com/RSE-Sheffield/sge-gpuprolog/b

Re: [gridengine users] Multi-GPU setup

2019-08-20 Thread Nicolas FOURNIALS
Le 14/08/2019 à 19:50, Dj Merrill a écrit : Thanks everyone for the feedback. I found this on Github that looks promising: https://github.com/RSE-Sheffield/sge-gpuprolog Thanks for pointing it. I can probably edit the scripts to also change the permissions on the /dev/nvidia* devices as

Re: [gridengine users] Multi-GPU setup

2019-08-14 Thread Dj Merrill
Thanks everyone for the feedback. I found this on Github that looks promising: https://github.com/RSE-Sheffield/sge-gpuprolog and this to go with it: https://gist.github.com/willfurnass/10277756070c4f374e6149a281324841 I can probably edit the scripts to also change the permissions on the /dev/

Re: [gridengine users] Multi-GPU setup

2019-08-14 Thread Joshua Baker-LePain
On Wed, 14 Aug 2019 at 7:21am, Dj Merrill wrote To date in our HPC Grid running Son of Grid Engine 8.1.9, we've had single Nvidia GPU cards per compute node. We are contemplating the purchase of a single compute node that has multiple GPU cards in it, and want to ensure that running jobs only h

Re: [gridengine users] Multi-GPU setup

2019-08-14 Thread bergman
In the message dated: Wed, 14 Aug 2019 10:21:12 -0400, The pithy ruminations from Dj Merrill on [[gridengine users] Multi-GPU setup] were: => To date in our HPC Grid running Son of Grid Engine 8.1.9, we've had => single Nvidia GPU cards per compute node. We are contemplating the =>

Re: [gridengine users] Multi-GPU setup

2019-08-14 Thread Ian Kaufman
You could probably do this using consumables and using resource quoatas to enforce them. Ian On Wed, Aug 14, 2019 at 8:34 AM Christopher Heiny wrote: > On Wed, 2019-08-14 at 16:35 +0200, Andreas Haupt wrote: > > Hi Dj, > > > > we do this by setting $CUDA_VISIBLE_DEVICES in a prolog script (and

Re: [gridengine users] Multi-GPU setup

2019-08-14 Thread Christopher Heiny
On Wed, 2019-08-14 at 16:35 +0200, Andreas Haupt wrote: > Hi Dj, > > we do this by setting $CUDA_VISIBLE_DEVICES in a prolog script (and > according to what has been requested by the job). > > Preventing access to the 'wrong' gpu devices by "malicious jobs" is > not > that easy. An idea could be

Re: [gridengine users] Multi-GPU setup

2019-08-14 Thread Friedrich Ferstl
Yes, UGE supports this out of the box. Depending on whether the job is a regular job or a Docker container the method used to restrict access only to the assigned GPU is slightly different. UGE also will only schedule jobs to nodes where it is guaranteed to be able doing this. The interface for

Re: [gridengine users] Multi-GPU setup

2019-08-14 Thread Tina Friedrich
Hello, from a kernel/mechanism point of view, it is perfectly possible to restrict device access using cgroups. I use that on my current cluster, works really well (both for things like CPU cores and GPUs - you only see what you request, even using something like 'nvidia-smi'). Sadly, my curre

Re: [gridengine users] Multi-GPU setup

2019-08-14 Thread Nicolas FOURNIALS
Hi, Le 14/08/2019 à 16:35, Andreas Haupt a écrit : Preventing access to the 'wrong' gpu devices by "malicious jobs" is not that easy. An idea could be to e.g. play with device permissions. That's what we do by having /dev/nvidia[0-n] files owned by root and with permissions 660. Prolog (execu

Re: [gridengine users] Multi-GPU setup

2019-08-14 Thread Skylar Thompson
Hi DJ, I'm not sure if SoGE supports it, but UGE has the concept of "resource maps" (aka RSMAP) complexes which we use to assign specific hardware resources to specific jobs. It functions sort of as a hybrid array/scalar consumable. It looks like this in the host complex_values configuration: cu

Re: [gridengine users] Multi-GPU setup

2019-08-14 Thread Andreas Haupt
Hi Dj, we do this by setting $CUDA_VISIBLE_DEVICES in a prolog script (and according to what has been requested by the job). Preventing access to the 'wrong' gpu devices by "malicious jobs" is not that easy. An idea could be to e.g. play with device permissions. Cheers, Andreas On Wed, 2019-08-

[gridengine users] Multi-GPU setup

2019-08-14 Thread Dj Merrill
To date in our HPC Grid running Son of Grid Engine 8.1.9, we've had single Nvidia GPU cards per compute node. We are contemplating the purchase of a single compute node that has multiple GPU cards in it, and want to ensure that running jobs only have access to the GPU resources they ask for, and d