Again, look into using it as a consumable resource as Gowtham posted above.
Ian On Mon, Apr 14, 2014 at 11:57 AM, Feng Zhang <[email protected]> wrote: > Thanks, Reuti, > > The socket solution looks like only work fine for serial jobs, not PE > jobs, right? > > Our cluster has different nodes, some nodes each has 2 GPUs, some > others each has 4 GPUs. Most of the user jobs are PE jobs, some are > serial. > > The socket solution can event work for PE jobs, but as my > understanding, it is not efficient? Since each node has, for example, > 4 queues. If one user submit a PE job to a queue, he/she can not use > the other GPUs on the other queues? > > On Mon, Apr 14, 2014 at 2:16 PM, Reuti <[email protected]> wrote: >> Am 14.04.2014 um 20:06 schrieb Feng Zhang: >> >>> Thanks, Ian! >>> >>> I haven't checked the GPU load sensor in detail, either. It sounds to >>> me it only handles the number of GPU allocated to a job, but the job >>> doesn't know which GPUs it actually get and set the >>> CUDA_VISIBLE_DEVICE(some programs need this env to be set). This can >>> be done by writing some scripts/programs, but to me, it is not an >>> accurate solution, since some jobs may still happen to collide to each >>> other on the same GPU on a multiple GPU node. If GE can have the >>> memory to record the GPUs allocated to a job, then this can be >>> perfect. >> >> Like the option to request sockets instead of cores which I posted in the >> last couple of days, you can use a similar approach to get the number of the >> granted GPU out of the queue name. >> >> -- Reuti >> >> >>> On Mon, Apr 14, 2014 at 1:46 PM, Ian Kaufman <[email protected]> wrote: >>>> I believe there already is support for GPUs - there is a GPU Load >>>> Sensor in Open Grid Engine. You may have to build it yourself, I >>>> haven't checked to see if it comes pre-packaged. >>>> >>>> Univa has Phi support, and I believe OGE/OGS has it as well, or at >>>> least has been working on it. >>>> >>>> Ian >>>> >>>> On Mon, Apr 14, 2014 at 10:35 AM, Feng Zhang <[email protected]> wrote: >>>>> Hi, >>>>> >>>>> Is there's any plan to implement the GPU resource management in SGE in >>>>> the near future? Like Slurm or Torque? There are some ways to do this >>>>> using scripts/programs, but I wonder that if the SGE itself can >>>>> recognize and manage GPU(and Phi). Not need to be complicated and >>>>> powerful, just do basic work. >>>>> >>>>> Thanks, >>>>> _______________________________________________ >>>>> users mailing list >>>>> [email protected] >>>>> https://gridengine.org/mailman/listinfo/users >>>> >>>> >>>> >>>> -- >>>> Ian Kaufman >>>> Research Systems Administrator >>>> UC San Diego, Jacobs School of Engineering ikaufman AT ucsd DOT edu >>> _______________________________________________ >>> users mailing list >>> [email protected] >>> https://gridengine.org/mailman/listinfo/users >> -- Ian Kaufman Research Systems Administrator UC San Diego, Jacobs School of Engineering ikaufman AT ucsd DOT edu _______________________________________________ users mailing list [email protected] https://gridengine.org/mailman/listinfo/users
