You could use a load sensor to do this. We use one to detect if people are logged in and suspend/requeue the jobs if someone logs in while a job's on their workstation.
http://arc.liv.ac.uk/SGE/howto/loadsensor.html shows you how to make one, then you'd set your queue to have a load/suspend threshold set at whatever you'd like (configurable per queue instance or host/hostgroup). You'd probably use nvidia-smi (assuming you're on Linux) to get the card details out and parse them to form the load figure. There are a few more related details here: http://serverfault.com/questions/322073/howto-set-up-sge-for-cuda-devices On Tue, Apr 16, 2013 at 2:51 AM, Nicolás Serrano Martínez-Santos <[email protected]> wrote: > Hi, > > We are currently using SGE6.2u5 in our little cluster (~150 cores) and I am > trying to configure it to manage GPU correct usage. I have been able to define > multiple slots for each GPU card and also to reserve memory using consumables. > > However, I haven't still managed to configure a consumable like h_vmem but for > GPU. This way a process will also be killed if it surpasses certain limit. > > I have been looking it in the internet but I haven't found anything similar, > is > it maybe part of a new version? > > Thanks in advance, > > NiCo > _______________________________________________ > users mailing list > [email protected] > https://gridengine.org/mailman/listinfo/users -- Stephen http://cerealkillers.co.uk _______________________________________________ users mailing list [email protected] https://gridengine.org/mailman/listinfo/users
