SGE has no information of GPU. Defining a consumable of "ngpus" is a way to do that, but SGE still does not know which GPU is assigned to which job(or process).
What I did is to set a script to assign available GPU id(s) to a job(or MPI process) , like SGE load sensor, but put it in /etc/prof.d(RedHat Linux), which can be run for each process(using ssh as qrsh), which is very useful for parallel GPU jobs. And set GPU to run in "Exclusive Thread" mode. If you run parallel job, your program needs to know to choose a usable GPU(assigned by the script and exported as $CUDA_VISIBLE_DEVICES) for each process, or do not set the GPU id in the program for each process and it can also work through CUDA. On Wed, Nov 19, 2014 at 11:41 AM, Kevin Taylor <[email protected]> wrote: > > The catch for us is that we want to know which GPU we're using for the code > we have. > > >> Date: Wed, 19 Nov 2014 16:58:32 +0100 >> From: [email protected] >> To: [email protected]; [email protected] >> Subject: Re: [gridengine users] Requesting a resource OR another resource > >> >> Hi. >> >> You have two gpu on one host.. why not define a consumable resource >> gpu=2 and request it with -l gpu=1 ? >> >> the value of gpu will be decreased by one, and it would be possible for >> another job to ask for the remaining gpu.. or you could request two gpus >> for one job, with -l gpu=2 >> >> Best regards. >> Robi >> >> >> Il 19.11.2014 14:44, Kevin Taylor ha scritto: >> > I'm not sure if this is possible or not, but thought I'd ask it. >> > >> > We have a setup of consumable resources for our GPUs. If a system has >> > two we have a complex called gpu1_free and gpu2_free. They'll be equal >> > to 1 if they're free and zero if they're not. Typically we just request >> > like this: qsub -l gpu1_free=1 job.sh >> > >> > Is there a way though qsub to say >> > >> > qsub -l gpu1_free=1 OR gpu2_free=1 job.sh >> > >> > I know putting multiple -l's will ask for both, but we just want one or >> > the other, whichever. >> > >> > Univa has that nice RSMAP feature that would solve our issue, but we >> > haven't worked out finances on that yet so we're seeing if we can just >> > make it work a little easier with what we have. >> > >> > Thanks. >> > >> > >> > >> > _______________________________________________ >> > users mailing list >> > [email protected] >> > https://gridengine.org/mailman/listinfo/users >> > > _______________________________________________ > users mailing list > [email protected] > https://gridengine.org/mailman/listinfo/users > _______________________________________________ users mailing list [email protected] https://gridengine.org/mailman/listinfo/users
