Thanks, Reuti, The socket solution looks like only work fine for serial jobs, not PE jobs, right?
Our cluster has different nodes, some nodes each has 2 GPUs, some others each has 4 GPUs. Most of the user jobs are PE jobs, some are serial. The socket solution can event work for PE jobs, but as my understanding, it is not efficient? Since each node has, for example, 4 queues. If one user submit a PE job to a queue, he/she can not use the other GPUs on the other queues? On Mon, Apr 14, 2014 at 2:16 PM, Reuti <re...@staff.uni-marburg.de> wrote: > Am 14.04.2014 um 20:06 schrieb Feng Zhang: > >> Thanks, Ian! >> >> I haven't checked the GPU load sensor in detail, either. It sounds to >> me it only handles the number of GPU allocated to a job, but the job >> doesn't know which GPUs it actually get and set the >> CUDA_VISIBLE_DEVICE(some programs need this env to be set). This can >> be done by writing some scripts/programs, but to me, it is not an >> accurate solution, since some jobs may still happen to collide to each >> other on the same GPU on a multiple GPU node. If GE can have the >> memory to record the GPUs allocated to a job, then this can be >> perfect. > > Like the option to request sockets instead of cores which I posted in the > last couple of days, you can use a similar approach to get the number of the > granted GPU out of the queue name. > > -- Reuti > > >> On Mon, Apr 14, 2014 at 1:46 PM, Ian Kaufman <ikauf...@eng.ucsd.edu> wrote: >>> I believe there already is support for GPUs - there is a GPU Load >>> Sensor in Open Grid Engine. You may have to build it yourself, I >>> haven't checked to see if it comes pre-packaged. >>> >>> Univa has Phi support, and I believe OGE/OGS has it as well, or at >>> least has been working on it. >>> >>> Ian >>> >>> On Mon, Apr 14, 2014 at 10:35 AM, Feng Zhang <prod.f...@gmail.com> wrote: >>>> Hi, >>>> >>>> Is there's any plan to implement the GPU resource management in SGE in >>>> the near future? Like Slurm or Torque? There are some ways to do this >>>> using scripts/programs, but I wonder that if the SGE itself can >>>> recognize and manage GPU(and Phi). Not need to be complicated and >>>> powerful, just do basic work. >>>> >>>> Thanks, >>>> _______________________________________________ >>>> users mailing list >>>> users@gridengine.org >>>> https://gridengine.org/mailman/listinfo/users >>> >>> >>> >>> -- >>> Ian Kaufman >>> Research Systems Administrator >>> UC San Diego, Jacobs School of Engineering ikaufman AT ucsd DOT edu >> _______________________________________________ >> users mailing list >> users@gridengine.org >> https://gridengine.org/mailman/listinfo/users > _______________________________________________ users mailing list users@gridengine.org https://gridengine.org/mailman/listinfo/users