You might the instructions here useful to implement Ian's approach:

  http://sgowtham.com/journal/2012/12/18/sge-scheduling-gpu-jobs-on-rocks-5-4-2/

I have had the same instructions work for me on Rocks 6.1.1 running GE 
2011.11p1.

Best regards,
g

--
Gowtham, PhD
HPC Research Scientist, ITS
Adj. Asst. Professor, Physics/ECE
Michigan Technological University

(906) 487/3593
http://it.mtu.edu
http://hpc.mtu.edu


On Mon, 14 Apr 2014, Ian Kaufman wrote:

| I think you can make it a consumable resource, such that once a
| specific GPU on a specific host is in use, no other jobs can land on
| it.
| 
| Ian
| 
| On Mon, Apr 14, 2014 at 11:06 AM, Feng Zhang <prod.f...@gmail.com> wrote:
| > Thanks, Ian!
| >
| > I haven't checked the GPU load sensor in detail, either. It sounds to
| > me it only handles the number of GPU allocated to a job, but the job
| > doesn't know which GPUs it actually get and set the
| > CUDA_VISIBLE_DEVICE(some programs need this env to be set). This can
| > be done by writing some scripts/programs, but to me, it is not an
| > accurate solution, since some jobs may still happen to collide to each
| > other on the same GPU on a multiple GPU node. If GE can have the
| > memory to record the GPUs allocated to a job, then this can be
| > perfect.
| >
| >
| > On Mon, Apr 14, 2014 at 1:46 PM, Ian Kaufman <ikauf...@eng.ucsd.edu> wrote:
| >> I believe there already is support for GPUs - there is a GPU Load
| >> Sensor in Open Grid Engine. You may have to build it yourself, I
| >> haven't checked to see if it comes pre-packaged.
| >>
| >> Univa has Phi support, and I believe OGE/OGS has it as well, or at
| >> least has been working on it.
| >>
| >> Ian
| >>
| >> On Mon, Apr 14, 2014 at 10:35 AM, Feng Zhang <prod.f...@gmail.com> wrote:
| >>> Hi,
| >>>
| >>> Is there's any plan to implement the GPU resource management in SGE in
| >>> the near future? Like Slurm or Torque? There are some ways to do this
| >>> using scripts/programs, but I wonder that if the SGE itself can
| >>> recognize and manage GPU(and Phi). Not need to be complicated and
| >>> powerful, just do basic work.
| >>>
| >>> Thanks,
| >>> _______________________________________________
| >>> users mailing list
| >>> users@gridengine.org
| >>> https://gridengine.org/mailman/listinfo/users
| >>
| >>
| >>
| >> --
| >> Ian Kaufman
| >> Research Systems Administrator
| >> UC San Diego, Jacobs School of Engineering ikaufman AT ucsd DOT edu
| 
| 
| 
| -- 
| Ian Kaufman
| Research Systems Administrator
| UC San Diego, Jacobs School of Engineering ikaufman AT ucsd DOT edu
| _______________________________________________
| users mailing list
| users@gridengine.org
| https://gridengine.org/mailman/listinfo/users
| 
_______________________________________________
users mailing list
users@gridengine.org
https://gridengine.org/mailman/listinfo/users

Reply via email to