You might the instructions here useful to implement Ian's approach: http://sgowtham.com/journal/2012/12/18/sge-scheduling-gpu-jobs-on-rocks-5-4-2/
I have had the same instructions work for me on Rocks 6.1.1 running GE 2011.11p1. Best regards, g -- Gowtham, PhD HPC Research Scientist, ITS Adj. Asst. Professor, Physics/ECE Michigan Technological University (906) 487/3593 http://it.mtu.edu http://hpc.mtu.edu On Mon, 14 Apr 2014, Ian Kaufman wrote: | I think you can make it a consumable resource, such that once a | specific GPU on a specific host is in use, no other jobs can land on | it. | | Ian | | On Mon, Apr 14, 2014 at 11:06 AM, Feng Zhang <prod.f...@gmail.com> wrote: | > Thanks, Ian! | > | > I haven't checked the GPU load sensor in detail, either. It sounds to | > me it only handles the number of GPU allocated to a job, but the job | > doesn't know which GPUs it actually get and set the | > CUDA_VISIBLE_DEVICE(some programs need this env to be set). This can | > be done by writing some scripts/programs, but to me, it is not an | > accurate solution, since some jobs may still happen to collide to each | > other on the same GPU on a multiple GPU node. If GE can have the | > memory to record the GPUs allocated to a job, then this can be | > perfect. | > | > | > On Mon, Apr 14, 2014 at 1:46 PM, Ian Kaufman <ikauf...@eng.ucsd.edu> wrote: | >> I believe there already is support for GPUs - there is a GPU Load | >> Sensor in Open Grid Engine. You may have to build it yourself, I | >> haven't checked to see if it comes pre-packaged. | >> | >> Univa has Phi support, and I believe OGE/OGS has it as well, or at | >> least has been working on it. | >> | >> Ian | >> | >> On Mon, Apr 14, 2014 at 10:35 AM, Feng Zhang <prod.f...@gmail.com> wrote: | >>> Hi, | >>> | >>> Is there's any plan to implement the GPU resource management in SGE in | >>> the near future? Like Slurm or Torque? There are some ways to do this | >>> using scripts/programs, but I wonder that if the SGE itself can | >>> recognize and manage GPU(and Phi). Not need to be complicated and | >>> powerful, just do basic work. | >>> | >>> Thanks, | >>> _______________________________________________ | >>> users mailing list | >>> users@gridengine.org | >>> https://gridengine.org/mailman/listinfo/users | >> | >> | >> | >> -- | >> Ian Kaufman | >> Research Systems Administrator | >> UC San Diego, Jacobs School of Engineering ikaufman AT ucsd DOT edu | | | | -- | Ian Kaufman | Research Systems Administrator | UC San Diego, Jacobs School of Engineering ikaufman AT ucsd DOT edu | _______________________________________________ | users mailing list | users@gridengine.org | https://gridengine.org/mailman/listinfo/users | _______________________________________________ users mailing list users@gridengine.org https://gridengine.org/mailman/listinfo/users