Re: [gridengine users] Monitor GPU memory usage per job

Stephen Willey Tue, 16 Apr 2013 14:25:30 -0700

You could use a load sensor to do this.  We use one to detect if
people are logged in and suspend/requeue the jobs if someone logs in
while a job's on their workstation.

http://arc.liv.ac.uk/SGE/howto/loadsensor.html shows you how to make
one, then you'd set your queue to have a load/suspend threshold set at
whatever you'd like (configurable per queue instance or
host/hostgroup).

You'd probably use nvidia-smi (assuming you're on Linux) to get the
card details out and parse them to form the load figure.  There are a
few more related details here:
http://serverfault.com/questions/322073/howto-set-up-sge-for-cuda-devices

On Tue, Apr 16, 2013 at 2:51 AM, Nicolás Serrano Martínez-Santos
<[email protected]> wrote:
> Hi,
>
> We are currently using SGE6.2u5 in our little cluster (~150 cores) and I am
> trying to configure it to manage GPU correct usage. I have been able to define
> multiple slots for each GPU card and also to reserve memory using consumables.
>
> However, I haven't still managed to configure a consumable like h_vmem but for
> GPU. This way a process will also be killed if it surpasses certain limit.
>
> I have been looking it in the internet but I haven't found anything similar, 
> is
> it maybe part of a new version?
>
> Thanks in advance,
>
> NiCo
> _______________________________________________
> users mailing list
> [email protected]
> https://gridengine.org/mailman/listinfo/users

-- 
Stephen

http://cerealkillers.co.uk

_______________________________________________
users mailing list
[email protected]
https://gridengine.org/mailman/listinfo/users

Re: [gridengine users] Monitor GPU memory usage per job

Reply via email to